Causality Causality Workbench                                                             Challenges in Machine Learning Causality

Unsupervised and Transfer Learning Challenge

Instructions


Matlab users

Download the sample code and modify main.m as suits you.

General instructions (all participants)

STEP 1: Get the data from the Data table, from the column "Data (text)", or from "Data (Matlab)" if you are a Matlab user.

STEP 2: Preprocess the data into a new representation: Using an unsupervised (or transfer learning) algorithm you have developed, prepare preprocessed data for either or both the validation set (valid) and the final evaluation set (final), for all the challenge datasets. For a data matrix ("valid" of "final") of dimension (p, n), including p=4096 examples and n features, a preprocessed data matrix is a (p, n') matrix including a new representation with n' features. You may group entries on different datasets under the same experiment name of your choice. See the submission format.

STEP 2bis: Create kernel matrices: As an alternative to Step 2, prepare similarity/kernel matrices for either or both the validation set (valid) and the final evaluation set (final). For p examples, a similarity/kernel matrix is a (p, p) symmetric positive semi-definite matrix (all eigen values positive or zero), indicating how similar pairs of examples are.

STEP 3: Submit your entries via the Submit page:

Submission Format

Submitted files must be bundled in a .zip archive including either or both following text files:

where dataname is one of the dataset names and expname is a chosen experiment name, and "valid" or "final" indicate the evaluation subset name. The files should include a numeric space delimited table with p rows, corresponding to the p=4096 examples in either the "valid" of "final" set, and n' columns corresponding to features/variables OR a (p, p) symmetric table corresponding to a similarity matrix between examples. Similarity matrices should be positive semi-definite (valid kernel matrices). Do not include categorical variables (except binary variables). Encode categorical variables with several binary variables.

SIZE LIMITATIONS: We cannot accept submissions of any size for reasons of bandwidth of transmission, memory, and speed of data processing. You are constrained to submit archives not larger than 50 MB. This was calculated to allow you to submit any size preprocessed data representation, provided that you follow these guidelines:

To create valid archives, use:

zip dataname_expname.zip dataname_expname_valid.prepro dataname_expname_final.prepro 

You can download a sample submission for the toy example ULE to familiarize yourself with the data format. The results were generated with the Matlab sample code. Results for each pair {dataname, expname} are submitted separately.