|
REGEDP studies the probe method on
REGED
REGEDP uses the artificially generated dataset REGED, to study the probe method. We assume that REGED
data came from a real, but unknown, generative process. We add to the 999
variables of MARTI 3996 "probes". Those are artificially generated
variables including randomly generated variables completely independent of
the target, and consequences of subsets of original variables (including
the target) and other probes. Importantly, no probe is a cause of the
target. Ideally, the probes should be generated from the (unknown) distribution
of non-causes of the target. We use instead a method for generating probes
that use permutations of values of some of the real variables, while enforcing
some causal dependencies.
Assume that we want to uncover causes of the target variable (lung cancer)
and we use a causal discovery algorithm for that purpose. The fraction of
probes selected as candidate causes is an indication of the fraction of false
positive. Because we know in that case the true data generative model, we
can analyze how useful the probe method is, despite the ad hoc way in which
the probes are generated.
The data include the same 500 training examples as REGED (in the same order).
All original variables come first and the probes are appended as extra columns.
No test data are provided.
Download
the data in text format [7.8 Mb].
Download
the data in Matlab format [8 Mb].
|