|
CINA is an econometrics
dataset
CINA (Census Is Not Adult) is derived from census data
(the UCI machine-learning repository Adult database). The data consists
of census records for a number of individuals. The causal discovery task
is to uncover the socio-economic factors affecting high income (the target
value indicates whether the income exceeds 50K). The 14 original attributes
(features) including age, workclass, education, education, marital
status, occupation, native country, etc. have been coded to eliminate
categorical variables. Distractor features (artificially generated variables,
which are not causes of the target) were added. In training data, some of
these distractors are effects (consequences) of the target and/or of other
real variables. Some are unrelated to the target or other real variables.
Hence, some of the distractors may be correlated to the target in training
data, although they do not cause it. The unmanipulated test data are distributed
like the training data. Hence both causes and consequences of the target my
be predictive in the unmanipulated test data. In contrast, in the manipulated
test data, all the distractors are "manipulated" by an "external agent" (i.e.
set to given value, not affected by the dynamics of the system)
and are therefore they cannot be relied upon to predict the target.
Download
the data.
|