Active Learning Challenge

Competition Rules

Goal of the challenge: Given a data matrix of samples represented as feature vectors (samples in line and features in columns), predict an unknown target variable (label). Initially a single example is labeled (seed). The participants must predict as well as possible the all the labels, by requesting to see as few labels as possible. See Evaluation for details.
Prizes: We have been raising cash prizes from generous donors, see the Credits page. Only results on the final test datasets will count towards winning prizes. The top ranking teams of each final test dataset will be eligible to win a cash prize of USD 100. To encourage the participants to enter the challenge on more than one dataset, a team who wins on N final test datasets will earn USD 100*2^(N-1) (if you win on 1, 2, 3, 4, 5, 6 datasets, you will earn USD 100, 200, 400, 800, 1600, 3200, respectively). The sponsor money eventually not used as prizes will be used as travel grants for deserving participants to help them attend the workshop.
Dissemination: The challenge is part of the competition program of the AISTATS conference, Sardinia, Italy, May 16, 2010 and the World Congress on Computational Intelligence (WCCI10), Barcelona, Spain, June 18-23, 2010. There are two publications opportunities, in JMLR W&CP and in the IEEE proceedings of WCCI 2010.

Schedule:

Dec. 1, 2009	Start of the development period. Development datasets made available.
Feb. 3, 2010	Begin final testing. Final datasets made available.
Feb. 7, 2010	WCCI 2010 papers due.
Mar. 3, 2010	End of the challenge at midnight (0 h Mar. 4, server time -- time indicated on the Submit page). Submissions closed.
Mar. 8, 2010	All teams must turn in fact sheets. The fact sheets will be used as abstracts for the workshop with AISTATS. JMLR W&CP reviewers and the participants are given access to the provisional ranking and the fact sheets. Start of post-challenge verifications.
Mar. 15, 2010	End of the post challenge verifications. Release of the official ranking. Notification of paper acceptance.
May 2, 2010	Camera ready copies of all papers due.
May 16, 2010	Workshop with AISTATS 2010, Sardinia, Italy.
July 19-23, 2010	Workshop at WCCI 2010, Barcelona, Spain.

Challenge protocol: For each dataset, the participants are allotted a budget of "virtual cash" allowing them to "purchase" all the training data labels at the price of 1 ECU (experimental cash unit) per label. They can place queries to the server by providing a list of samples for which they desire to purchase the label. Upon receipt of the labels, their account of virtual cash is debited. The participants are free to choose the number of queries and the number of samples per query. An experiment terminates when all the budget is spent or the challenge deadline is reached. To monitor progress, the participants are asked to provide predictions for all the labels every time they place a query, including the known and unknown labels of the training examples and the labels of the test examples.
Conditions of participation: Anybody who complies with the rules of the challenge is welcome to participate. The participants are not required to attend the workshops where the results will be discussed, and the workshops are open to non-challenge participants.
Anonymity: All entrants must identify themselves to the organizers. However, only your "Workbench id" will be displayed in result tables and you may choose a pseudonym to hide your identity to the rest of the world. Your emails will remain confidential.
Team verification: Towards the end of the development phase, the participants will register as teams. Each participant will be allowed to enter only as part of a single team. The teams will be checked and the organizers reserve the right to merge teams who look too closely affiliated. The team leaders will be responsible that the team respects the rules of the challenge.
Data: Datasets from various domains and various difficulty are made available for practice during the development period. The final test datasets will be released in January 2010. The data are available to download from the Data page.
Submission method: The method of submission is via the form on the Submit page. To be ranked, submissions must comply with the Instructions. Robot submissions are permitted. If the system gets overloaded, the organizers reserve the right to limit the number of submissions per day per participant. If you encounter problems with the submission process, please contact the Challenge Webmaster.
Ranking: The method of scoring is posted on the Evaluation page. If the scoring method changes, the participants will be notified by email by the organizers.
- During the development period, the scores will be posted in the Leaderboard table. The participants will be allowed to perform multiple experiments on the same dataset, every time starting again with a new budget to purchase all the labels.
- During the final test period, no results will be displayed until the challenge is over. Only one experiment per dataset will be allowed. Separate rankings will be performed for the various datasets.
Reproducibility: Everything is allowed that is not explicitly forbidden. We forbid acquiring labels under fake names, by registering multiple times, or exchanging labels with other participants. Participation is not conditioned on delivering your code nor publishing your methods. However, we will ask the top ranking participants to voluntarily cooperate to reproduce their results. This will include filling out a fact sheet about their methods and eventually participating in post-challenge tests and sending us their code, including the source code. The outcome of our attempt to reproduce your results will be published and add credibility to your results.