MARTIP studies the probe method on MARTI

MARTIP uses the artificially generated dataset MARTI, to study the probe method. We assume that MARTI data came from a real, but unknown, generative process. We add to the 1024 variables of MARTI 4096 "probes".  Those are artificially generated variables including randomly generated variables completely independent of the target, and consequences of subsets of original variables (including the target) and other probes. Importantly, no probe is a cause of the target. Ideally, the probes should be generated from the (unknown) distribution of non-causes of the target. We use instead a method for generating probes that use permutations of values of some of the real variables, while enforcing some causal dependencies.

Assume that we want to uncover causes of the target variable (lung cancer) and we use a causal discovery algorithm for that purpose. The fraction of probes selected as candidate causes is an indication of the fraction of false positive. Because we know in that case the true data generative model, we can analyze how useful the probe method is, despite the ad hoc way in which the probes are generated.

The data include the same 500 training examples as MARTI (in the same order). All original variables come first and the probes are appended as extra columns. No test data are provided.

Download the data in text format [7.8 Mb].
Download the data in Matlab format [8 Mb].