Causality Causality Workbench                                                             Challenges in Machine Learning Causality

Unsupervised and Transfer Learning Challenge

This challenge addresses a question of fundamental and practical interest in machine learning: the assessment of data representations produced by unsupervised learning procedures, for use in supervised learning tasks. It also addresses the evaluation of transfer learning methods capable of producing data representations useful across many similar supervised learning tasks, after training on supervised data from only one of them.

The challenge is over, but the platform is open for post-challenge submissions ** See the results

Transfer Learning Tutorial (15 min video) ** Challenge result paper

h4 h2
h6 h5
c8 d1 h7
c1 c9
D6 D7
d5 d4 c10 D8 D3 D2


Registered 658
Entrants 89
Total jobs Phase 1: 6933
Phase 2: 1073
Post-challenge: 1317
Final experiments Phase 1: 41
Phase 2: 13
Post-challenge: 9


Classification problems are found in many application domains, including in pattern recognition (classification of images or videos, speech recognition), medical diagnosis, marketing (customer categorization), and text categorization (filtering of spam). The category identifiers are referred to as "labels". Predictive models capable of classifying new instances (correctly predicting the labels) usually require “training” (parameter adjustment) using large amounts of labeled training data (pairs of examples of instances and associated labels). Unfortunately, few labeled training data may be available due to the cost or burden of manually annotating data. Recent research has been focusing on making use of the vast amounts of unlabeled data available at low cost including: space transformations, dimensionality reduction, hierarchical feature representations ("deep learning"), and kernel learning. However, these advances tend to be ignored by practitioners who continue using a handful of popular algorithms like PCA, ICA, k-means, and hierarchical clustering. The goal of this challenge is to perform an evaluation of unsupervised and transfer learning algorithms free of inventor bias to help to identify and popularize algorithms that have advanced the state of the art.

Five datasets from various domains are made available. The participants should submit on-line transformed data representations (or similarity/kernel matrices) on a validation set and a final evaluation set in a prescribed format. The data representations (or similarity/kernel matrices) are evaluated by the organizers on supervised learning tasks unknown to the participants. The results on the validation set are displayed on the learderboard to provide immediate feed-back. The results on the final evaluation set will be revealed only at the end of the challenge. To emphasize the capability of the learning systems to develop useful abstractions, the supervised learning tasks used to evaluate them make use of very few labeled training examples and the classifier used is a simple linear discriminant classifier. The challenge will proceed in 2 phases:

Competition Rules