HIVA is a chemoinformatics dataset

The task of HIVA is to predict which compounds are active against the AIDS HIV infection. The original data has 3 classes (active, moderately active, and inactive). We brought it back to a two-class classification problem (active vs. inactive). We represented the data as 1617 sparse binary input variables. The variables represent properties of the molecule inferred from its molecular structure. The problem is therefore to relate structure to activity (a QSAR=quantitative structure-activity relationship problem) to screen new compounds before actually testing them (a HTS=high-throughput screening problem).
The original data were made available by The National Cancer Institute (USA). The 3d molecular structure was obtained by the CORINA software and the features were derived with the ChemTK software.
The HIVA dataset was used previously in the Performance Prediction challenge, the Model Selection game, and the Agnostic Learning vs. Prior Knowledge (ALvsPK) challenge. A variant of the HIVA dataset called SIDO was used in the Causation and Prediction challenge and the Pot-Luck challenge
CausalityThis dataset is used in the Active Learning Challenge by the Causality Workbench