Causality Causality Workbench                                                             Challenges in Machine Learning Causality

Active Learning Challenge

Frequently Asked Questions


What is the goal of the challenge?
The goal is to predict the unknown values of a target variable (label) given the values of other predictor variables. A large number of unlabeled samples are given. One may query some "truth values" of labels for a "virtual cash" fee. The challenge is to predict all the labels as well as possible with as few purchased labels as possible.

Are there prizes and travel grant?
Yes, there will be both cash prizes and travel grants. See the Rules.

Will there be a workshop and proceedings?
Yes, there will be two workshops, one at AISTATS, May 16, 2010, in Sardinia, Italy and one at WCCI 2010 in July 2010. The WCCI proceedings will be published by the IEEE and the AISTATS proceedings by JMLR W&CP. NOTE: The WCCI paper deadline is January 31, 2010, at the end of the development period, before the final tests.

Since the WCCI deadline is before the end of the challenge, how will you judge the papers?
The papers will be judged as regular conference papers based on relevance to the workshop topics, novelty/originality, usefulness, and sanity. We encourage challenge participants to incorporate their results on the development datasets.

Are we obliged to attend the workshop(s) of publish our method(s) to participate in the challenge?

Can I attend the workshop(s) if I do not participate in the challenge?
Yes. You can even submit papers for presentation on the topics of the workshops, including: active learning, learning from unlabeled data, or experimental design.

Tasks of the challenge

Can I just ignore the active learning problem and purchase all the training labels of the final test sets to participate?
Yes you can. However, you could easily do better than that by making several incremental purchases even by sampling the examples at random since we are using the area under the learning curve as scoring metric. See the Evaluation page.

Is causality needed to solve the problems of the challenge?

So, why is this part of the Causality Workbench?
The challenge uses the Virtual Lab of the Causality Workbench. We organized this challenge because of the importance of the problem (many applications have large volumes of unlabeled data) and because it is a stepping stone towards understanding how to design well experiments in causal discovery. In many designed experiments, the first step is to sample from a study population before applying interventions.

How do you define active learning?
There are several definitions, see the Tutorial. For this challenge, we consider the problem of "pool-based active learning" in which a large unlabeled dataset is available and the problem is to label all the instances as automatically as possible, i.e. by requesting as little human intervention to label instances as possible.

Is active learning different from query learning?
For this challenge they are synonymous. Please contact us if you have interesting suggestions to differentiate the two concepts.

Is active learning different from experimental design?
From our point of view active learning is a form of iterative experimental design involving the learning machine. However:
  • "Classical experimental design" (from statistics textbooks) does not involve a learning machine in the process of designing experiments.
  • Most of machine learning concentrates on situations in which no interventions on the data generating system are made, the active part of learning is limited to choosing samples appropriately. In contrast, much of "classical experimental design" is devoted to performing interventions on the system under study.
Why can't we make "de novo" queries?
"De novo" queries are queries of labels for examples (predictor variable vectors), which are not part of the dataset and are generated artificially. We like to think of this problem as an intervention on the system under study in which some variable values are imposed by an external agent. This problem will be treated in an upcoming challenge since it is very different in nature.

Why are there no multiclass and regression tasks?
It was difficult to provide a good unified criterion for all problems. Moreover multiclass and regression problems are harder. We preferred testing active learning methods with one criterion consistent across all tasks and let the participants focus on solving the active learning problem rather than dealing with multiple difficulties.


Are the datasets using real data?
Yes, all of them, except for the toy problem ALEX, which we supply only for demonstration purpose.

How do we get the training labels?
You must purchase them with virtual cash, see the Instructions. Note that, to facilitate algorithm development, we give you direct access to all the labels of the development datasets. Read the "Algorithm Development" section of the Instructions. But for the final datasets, buying labels from virtual cash will be the ONLY option.

How many times can I buy labels?
As many times as you want until you run out of virtual cash and before the end of the challenge.

Can domain knowledge be used to facilitate solving the tasks of the challenge?
We purposely did not disclose the identity of the features, which may be available in real applications. We provide information on the datasets to make things more concrete and motivate participation, but we do not expect the participants to tailor their algorithms to the specifics of the domain.

Can we learn from the unlabeled data or perform "transduction"?

Are the distributions of the training and test datasets identical?
Yes. The samples are randomly ordered. The first half is reserved for training and the second half for testing.

Why are you not disclosing the fraction of positive examples in final datasets?
In most practical situations, this statistic is not available.

Can we assume that the positive class is the most depleted one?
Yes. There are few positive labels than negative labels.

The "Sample num" in the "My Lab" table does not always match the number of labels I asked for, why?
You may have asked for "forbidden" labels from the test set (the second half of the dataset) or for labels you already asked for. You are not charged for duplicate queries or for labels not delivered.

Could't people cheat by getting the labels from friends or by entering first under a fake name?
During the development period, this is irrelevant, since you are allowed to restart your experiments as many times as desired and it is possible and to ask for all the labels at once. In fact, we are even making the development dataset labels available to you for download, see the section "Algorithm Development" in the Instructions.
But, for the final tests, it is explicitly forbidden by the rules of the challenge to exchange labels between teams or gain access to labels by making a fake registration. Before downloading data, the team leader will have to agree with this rule and vouch for all team members. We implemented a method to detect cheaters, which we do not disclose. If we suspect a team to be cheating, we will ask that team to collaborate with us to clear the doubt. This may include performing additional tests. Teams convicted of cheating may be asked to resign.

Will the final datasets resemble the development datasets?
They will be from the same domains of application as the development datasets, but may differ in data representation and difficulty. However, the problems to be solved will not be significantly harder than the development problems.

Will the data split be the same in the final datasets?
The number of examples may be different but half of them will be reserved for training and half for testing as for the development datasets.


Why do you use the AUC to compute the score?
Many active learning problems have one class, which is significantly more depleted than the other and the problem is more that of finding the best candidates of the positive class (a ranking problem) than classifying.

Since you are using the AUC to compute the score do we still need to adjust the bias on the prediction values?
No, you do not need to.

If I make a single submission, will you use the ALC or the AUC to score my submission?
The global score (normalized ALC) is used to rank ALL experiments (even when a single submission is made). If you make a single submission, we extrapolate your learning curve as explained on the Evaluation page.

Can you give a numerical example on how the global score is computed?
The Matlab code for the global score is provided, see the function alc.m in the sample code. For instance, assume that, on the ALEX toy example, for your 2 first submissions you get the AUC scores of 0.6366 and 0.6844, after purchasing a total of 1 label (the seed) and 2 labels (the seed and another example). You will get the global scores of 0.2732 and 0.3649 respectively. Here is how:
  1. The maximum number of examples you can purchase (your initial budget) is 5000 ECU (Experimental Cash Units). This determines the x axis position of the last point on the learning curve xmax=log2(5000)=12.29.
  2. The ALC (Area under the Learning Curve) for the two reference learning curves are: Arand=0.5*xmax=6.14 and Amax=1*xmax=12.29.
  3. The global score is computed according to:
    global_score = (ALC-Arand)/(Amax-Arand) 
  4. For the first point, since we extrapolate learning curves with a horizontal line, we have ALC=0.6366*xmax. Hence, global_score=0.2732.
  5. For the second point, we interpolate linearly between points (then extrapolate horizontally). Hence since we added only 1 example deltax=log2(2)-log2(1)=1. Therefore ALC=(0.6366+0.6844)/2*deltax+0.6844*(xmax-deltax). We get ALC=8.386, and therefore global_score=0.3649.
Can I do better than random for my initial submission?
Yes. We provide you with one "seed" label. You may use it for instance to rank all the samples according to distance to the seed.

Can I make submissions with mixed methods?
Mixed submissions containing results of different methods on the various datasets are permitted.

I do not see my results in the "Leaderboard" table or in "My Lab", what's wrong?
Make sure your query complies with the Instructions.

Why is my initial budget the number of samples minus one?
You get enough Experimental Cash Units (ECU) to buy all the labels. Initially, we give you a seed label, this is why we subtract 1 ECU right away. The seed label is given in the Data table.

One of my experiments does not appear anymore in the list on the upload page, why?
Experiments close when you have spent all your budget or reached an AUC score of 1.

Will the results on development test sets count for the final ranking?
No. However, you may report these results in your paper submitted to the workshop.

Do I have to submit results on all "final" datasets?
No. During the final test period, you may submit results on any subset of the final datasets. However, you may win much bigger prizes (exponentially scaled) if you submit results on multiple datasets. See the Rules.

Is there a limit to the number of submissions?
No. During the development period you can make as many experiments as you want on any dataset. During the test period, you will have only one chance to design a successful set of queries. No restart possible.

Can I submit prediction results several times without asking for new labels?
Yes. This may help you during the development period, but during final testing you will not get any performance feed-back.


Can I use a robot to make submissions?
Robot submissions are not explicitly forbidden. However, we require that the total number of submissions per 24 hours from the same origin does not exceed 15 (this should allow you to complete in one day a full experiment). Please be courteous otherwise we run at risk of overloading the server and we would then need to take more drastic measures.

Can I use an alias or a funky email not to reveal my identity?
We require participants to identify themselves by their real name when they register, and you must always provide a valid email so we can communicate with you. But your name will remain confidential, unless you agree to reveal it. Your email will always remain confidential. You may select an alias for your Workbench ID to hide your identity in the result tables and remain anonymous during the challenge.

Do I need to let you know what my method is?
Disclosing information about your method is optional during the development period. However, to participate to the final ranking, you will have to fill out a fact sheet about your method(s). We encourage the participants not only to fill out the fact sheets, but write a paper with more details. A best paper award will distinguish entries with particularly original methods, methods with definite advantages (other that best performance) and good experimental design.

Will the organizers enter the competition?
The prize winners may not be challenge organizers. The challenge organizers will enter development submissions from time to time, under the name "Reference". Reference entries are shown for information only and are not part of the competition.

Can a participant give an arbitrary hard time to the organizers?
In case of dispute about prize attribution or possible exclusion from the competition, the participants agree not to take any legal action against the organizers, IEEE, or data donors. Decisions can be appealed by submitting a letter to Philip HINGSTON and will be resolved by the committee of co-chairs of the WCCI 2010 conference.


Is there code I can use to perform the challenge tasks?
We provide the following tools written in Matlab (R):
  • Sample code creating queries and performing active learning with a basic strategy.
  • A lean version of the GLOP package, which is used in the back-end of the Virtual Lab to process queries, is provided with the sample code. We provide a toy problem called ALEX (Active Learning EXample) to illustrate how GLOP works for the Active Learning Challenge and the other development tasks.
  • The CLOP package, which includes many machine learning algorithm that were successful in past challenges.
Who can I ask for more help?
For all other questions, email

Last updated February 4, 2010.