CINAR: Raw data for CINA, an econometrics dataset

CINAR  (Census Is Not Adult Raw) is derived from census data (the UCI machine-learning repository Adult database). The data consists of census records for a number of individuals. The causal discovery task is to uncover the socio-economic factors affecting high income (the target value indicates whether the income exceeds 50K). The 14 original attributes (features) including age, workclass,  education, education, marital status, occupation, native country, etc. were coded in the CINA dataset to eliminate categorical variables. Here we provide the RAW DATA. Also in CINA, distractor features (artificially generated variables, which are not causes of the target) were added. Here, they have not been included.

Download the data.