Causality Causality Workbench                                                             Challenges in Machine Learning Causality

Causality Challenge #1: Causation and Prediction

How to format and ship results

Results File Formats

The results on each dataset should be formatted in text files according to the following table. If you are a Matlab user, you may find some of the sample code routines useful for formatting the data. You can view an example of each format from the filename column.

Filename Development Final entries Description File Format
[dataname]_feat.ulist Optional Compulsory (unless [dataname]_feat.slist is given) Unsorted list of N features used to make predictions. A space-delimited unsorted list of variables/features used to make prediction of the target variable. The features are numbered starting from 1, in the column order of the data tables.
[dataname]_feat.slist Optional Compulsory (unless [dataname]_feat.ulist is given) Sorted list of N features used to make predictions. A space-delimited sorted list of variables/features, most likely to be predictive come first. The features are numbered starting from 1, in the column order of the data tables. The list should contain no repetition, but it may contain a subset of all features. As explained below, the list may be used to define nested subsets of n predictive features.
[dataname]_train.predict Optional Compulsory Target prediction result table for training examples. Target prediction values* for all M samples of the data tables. There are two possible formats: (1) a single column of M prediction values, obtained with all the features of [dataname]_feat.ulist or [dataname]_feat.slist, or (2) a space delimited table of predictions for a varying number of predictive features, with M lines and C columns. The second format may be used only in conjunction with a valid [dataname]_feat.slist file. Each column should represent the prediction values obtained with only the first n features of [dataname]_feat.slist, where n varies by powers of 2: 1, 2, 4, 8, ... If the total number of features N in [dataname]_feat.slist is not a power of 2, the last column should correspond to using all N features. Hence C=log2(N)+1 or log2(N)+2.
[dataname]_test.predict Compulsory Compulsory Target prediction result table for test examples.
* If the targets are binary {+1, -1} values (two-class classification) the prediction values may either be binary {+1, -1} or discriminant positive or negative values (zero will be interpreted as a small positive value).

When you submit your results you get immediate feed-back on the Result page: a color code indicates in which quartile your performance lies. Explanations about the scoring measures are found on the Evaluation page.
IMPORTANT: To see your results in the "Overall" Result table, you must enter results for all the tasks whose data names differ only by the number in the same submission, e.g. for REGED, you must enter results for REGED0, REGED1, and REGED2. Only the entries figuring in that table will be considered for the final ranking and the challenge prizes.

Results Archive Format

Submitted files must be in either a .zip or .tar.gz archive format. You can download the example zip archive to help familiarise yourself with the archive structures and contents (the results were generated with the sample code). Submitted files must use exactly the same filenames as in the example archive. If you use tar.gz archives please do not include any leading directory names for the files. Use

zip *.predict *.slist [or *.ulist]
tar cvf results.tar *.predict *.slist [or *.ulist]; gzip results.tar
to create valid archives.