Register to our Google group causalitychallenge to keep informed!
DATA FORMAT: We provide a sample data donation. The data format is explained in the README file.
USE OF DATA: By submitting data, the data donors agree to put their data in the public domain and grant unrestricted use of their data, subject to being given proper credit. The organizers intend to run the best ranking methods of track 2 on donated data to infer potentially new causal relationships. The results will be made available to the data donors to make scientific discoveries and write their paper. The data donors are allowed to withhold the variable names and the truth values of the causal relationships until their paper is published.
SUBMISSION METHOD: Email your data with your coordinates and a brief description to causality@chalearn.org before Friday May 17, 2013.
DATA PROVIDED: See the data page.
SUBMISSIONS: Submissions for track 2 are handled by the Kaggle website, both for predictions and software. We provide a sample csv file containing prediction results. We also provide sample Matlab code and sample Python code. The prediction results should be formatted in the following way: each line represents a variable pair. The pair identifier is followed by a comma then the prediction:
valid1, 0.23 valid2, -0.001 ... valid2642, 2.8 test1, -29 test2, 1.4 .... test7892, 100Large positive predictions indicate confidence in A->B, small negative predictions confidence in A<-B. Values near zero mean that neither causal relationship can be detected with confidence (there may be a dependency that can be explained by a common cause or no dependency at all).
For the purpose of this challenge, two variables A and B are causally related if:
B = f (A, noise) or A = f (B, noise).
If the former case, A is cause of B and in the latter case B is a cause of A. All other factors are lumped into the "noise". We provide samples of joint observations of A and B, not organized in a time series. We exclude feed-back loops and consider only 4 types of relationships:
A->B | A causes B | Positive class |
B->A | B causes A | Negative class |
A - B | A and B are consequences of a common cause | Null class |
A | B | A and B are independent | Null class |
We bring the problem back to a classification problem: for each pair of variable {A, B}, you must answer the question: is A a cause of B? (or, since the problem is symmetrical in A and B, is B a cause of A?)
We expect the participants to produce a score between -Inf and +Inf, large positive values indicating that A is a cause of B with certainty, large negative values indicating that B is a cause of A with certainty. Middle range scores (near zero) indicate that neither A causes B nor B causes A.
For each pair of variables, we have a ternary truth value indicating whether A is a cause of B (+1), B is a cause of A (-1), or neither (0). We use the scores provided by the participants as a ranking criterion and evaluate their entries with two Area Under the ROC curve (AUC) scores:
Let Yhat be your predicted score in [-Inf, +Inf] and Y the target values in {-1, 0 1}.
We define Y1=Y; Y1(Y==0)=-1; and Y2=Y; Y2(Y==0)=+1;
Score = 0.5*(AUC(Yhat, Y1)+AUC(Yhat, Y2));
The first score AUC(Yhat, Y1) measures the success at correctly detecting that A->B rather than [A<-B, A-B, or A|B].
The second score AUC(Yhat, Y2) measures the success at correctly detecting that A<-B rather than [A->B, A-B, or A|B].
Since the problem is symmetric, we average the two scores.
The organizers will also compute various other score for analysis purpose, but they will not be used to rank the participants.
Consider a classification problem for which the labels are binary +-1. Consider a model returning a numerical predicition score, larger values indicating higher confidence in positive class membership. The results of classification, obtained by thresholding the prediction score, may be represented in a confusion matrix, where tp (true positive), fn (false negative), tn (true negative) and fp (false positive) represent the number of examples falling into each possible outcome:
Prediction | |||
---|---|---|---|
Class +1 | Class -1 | ||
Truth | Class +1 | tp | fn |
Class -1 | fp | tn |
We define the sensitivity (also called true positive rate or hit rate) and the specificity (true negative rate)
as:
Sensitivity = tp/pos
Specificity = tn/neg
where pos=tp+fn is the total number of positive examples and neg=tn+fp the total number of negative examples.
The prediction results are evaluated with the so-called Area Under ROC Curve (AUC), which we refer to as AUC. It corresponds to the area under the curve obtained by plotting sensitivity against specificity by varying a threshold on the prediction values to determine the classification result. The AUC is related to the area under the lift curve and the Gini index used in marketing (Gini=2 AUC -1). The AUC is calculated using the trapezoid method. In the case when binary scores are supplied for the classification instead of discriminant values, the curve is given by {(0,1),(tn/(tn+fp),tp/(tp+fn)),(1,0)} and the AUC is just the Balanced ACcuracy BAC.