Unsupervised and Transfer Learning Challenge

Instructions

Matlab users

Download the sample code and modify main.m as suits you.

General instructions (all participants)

STEP 1: Get the data from the Data table, from the column "Data (text)", or from "Data (Matlab)" if you are a Matlab user.

STEP 2: Preprocess the data into a new representation: Using an unsupervised (or transfer learning) algorithm you have developed, prepare preprocessed data for either or both the validation set (valid) and the final evaluation set (final), for all the challenge datasets. For a data matrix ("valid" of "final") of dimension (p, n), including p=4096 examples and n features, a preprocessed data matrix is a (p, n') matrix including a new representation with n' features. You may group entries on different datasets under the same experiment name of your choice. See the submission format.

STEP 2bis: Create kernel matrices: As an alternative to Step 2, prepare similarity/kernel matrices for either or both the validation set (valid) and the final evaluation set (final). For p examples, a similarity/kernel matrix is a (p, p) symmetric positive semi-definite matrix (all eigen values positive or zero), indicating how similar pairs of examples are.

STEP 3: Submit your entries via the Submit page:

Development entries: During the development period, you may submit data representations (or similarity/kernel matrices) on the validation sets as many times as you want. Your submissions may optionally contain preprocessed data on the final evaluation set. However, only the results on validation data will be displayed on the Leaderboard and in My Lab. To facilitate your work, you can practice using the toy example ULE, for which you have all the labels. But we urge you to try the submission system to make sure it works for you. All submissions on different datasets made under the same experiment name are grouped in the Leaderboard page.
Complete final experiment: For each phase (phase 1 and phase 2), your last COMPLETE experiment will count towards the final ranking. A complete experiment consists of submissions of final evaluation set preprocessed data for ALL 5 datasets of the challenge, under the same experiment name (submitting results on validation data for the final submission is optional).

Submission Format

Submitted files must be bundled in a .zip archive including either or both following text files:

dataname_expname_valid.prepro
dataname_expname_final.prepro

where dataname is one of the dataset names and expname is a chosen experiment name, and "valid" or "final" indicate the evaluation subset name. The files should include a numeric space delimited table with p rows, corresponding to the p=4096 examples in either the "valid" of "final" set, and n' columns corresponding to features/variables OR a (p, p) symmetric table corresponding to a similarity matrix between examples. Similarity matrices should be positive semi-definite (valid kernel matrices). Do not include categorical variables (except binary variables). Encode categorical variables with several binary variables.

SIZE LIMITATIONS: We cannot accept submissions of any size for reasons of bandwidth of transmission, memory, and speed of data processing. You are constrained to submit archives not larger than 50 MB. This was calculated to allow you to submit any size preprocessed data representation, provided that you follow these guidelines:

Submit either XX' or X, whichever is smallest: If your preprocessed data matrix X has more features than examples (more columns than rows), submit XX' (the product of the matrix by its transpose) instead of X. We will notice automatically and the evaluation results will be identical to those you would get by submitting X. But the amount of data transmitted will be less.
Quantize your data: We suggest you quantize between 0 and 999 and print them in integer format. High precision usually makes no difference in performance.
Make a separate submission for validation data and for final evaluation data unless your data representations are small.

To create valid archives, use:

zip dataname_expname.zip dataname_expname_valid.prepro dataname_expname_final.prepro

You can download a sample submission for the toy example ULE to familiarize yourself with the data format. The results were generated with the Matlab sample code. Results for each pair {dataname, expname} are submitted separately.