We propose datasets from various application domains (all real data). For all datasets, there are 3 (unlabeled) subsets: development set, validation set, and final evaluation set. During the development period, you may get immediate feed-back on the Leaderboard and in My Lab by making submissions on the validation set of valid data representations. Turn in your representation on the final evaluation set when you are ready for final testing.
In phase 1, no labels were available.
Dataset | Domain | Feat. num. | Sparsity (%) | Development num. | Transfer num. | Validation num. | Final Eval. num. | Data (text) | Data (Matlab) |
---|---|---|---|---|---|---|---|---|---|
AVICENNA | Arabic manuscripts | 120 | 0.00 | 150205 | 50000 | 4096 | 4096 | 16 MB | 14 MB |
HARRY | Human action recognition | 5000 | 98.12 | 69652 | 20000 | 4096 | 4096 | 13 MB | 15 MB |
RITA | Object recognition | 7200 | 1.19 | 111808 | 24000 | 4096 | 4096 | 1026 MB | 762 MB |
SYLVESTER | Ecology | 100 | 0.00 | 572820 | 100000 | 4096 | 4096 | 81 MB | 69 MB |
TERRY | Text recognition | 47236 | 99.84 | 217034 | 40000 | 4096 | 4096 | 73 MB | 56 MB |
ULE (toy data) | Handwritten digits | 784 | 80.85 | 26808 | 10000 | 4096 | 4096 | 7 MB | 13 MB |
We provide a toy dataset called ULE (Unsupervised Learning Example dataset). This dataset is NOT part of the challenge. It is provided for practice purpose. We used this dataset to provide example submissions (see the Instructions) with our Matlab sample code and example learning curves (see the Evaluation page). For ULE you get all the data labels. For all other datasets, the data come with no label in phase 1 and you will get only the transfer labels in phase 2.