Causality Causality Workbench                                                             Challenges in Machine Learning Causality
Rating : (1 vote)

SIDO: A phamacology dataset

Contact: Isabelle Guyon - Submitted: 2008-09-12 02:53 - Views : 2076

This is one of the datasets of the first causality challenge: causation and prediction. The goal of the challenge was to make predictions under manipulations. SIDO (SImple Drug Operation mechanisms) contains descriptors of molecules, which have...  [more/question/discuss/rate/edit...]

PROMO: Simple causal effects in time series

Contact: Jean-Philippe Pellet - Submitted: 2011-01-26 17:59 - Views : 4124

The PROMO dataset proposes the task to identify which promotions affect sales. Artificial data about 1000 promotion variables and 100 product sales is provided. The goal is to predict a 1000x100 boolean influence matrix, indicating for each (i,j)...  [more/question/discuss/rate/edit...]

CYTO: Causal Protein-Signaling Networks in human T cells

Contact: Karen Sachs - Submitted: 2008-11-14 15:50 - Views : 4425

This dataset consists of roughly 700 to 900 single cell recordings of the abundance of 11 phosphoproteins and phospholipids (PKC, PKA, P38, Jnk (pjnk), Raf (praf), Mek (pmek), Erk (p44/42), Akt (pakts473), PLC-gamma (plcg), PIP2, PIP3) under various...  [more/question/discuss/rate/edit...]

TIED: Target Information Equivalent Dataset

Contact: Alexander Statnikov - Submitted: 2008-09-12 20:24 - Views : 2689

TIED dataset 2008 Alexander Statnikov and Constantin Aliferis Introduction TIED stands for Target Information Equivalent Dataset. It is an artificial simulated dataset constructed to illustrate that there may be many minimal sets of...  [more/question/discuss/rate/edit...]

SIGNET: Abscisic Acid Signaling Network

Contact: Jerry Jenkins - Submitted: 2008-11-25 20:56 - Views : 7316

The objective is to determine the set of boolean rules that describe the interactions of the nodes within this plant signaling network. The dataset includes 300 separate boolean pseudodynamic simulations of the true rules, using an asynchronous...  [more/question/discuss/rate/edit...]

CINA: A marketing dataset

Contact: Isabelle Guyon - Submitted: 2008-09-12 02:37 - Views : 2736

CINA (Census Is Not Adult) is derived from census data (the UCI machine-learning repository Adult database). The data consists of census records for a number of individuals. The causal discovery task is to uncover the socio-economic factors...  [more/question/discuss/rate/edit...]

  • Authors: Causality workbench team
  • Key facts: Number of variables: 132 (demographic data) + one binary target variable . Number of examples: training 16033 + 3 test sets of 10000 examples corresponding...
  • Keywords: probe.method, marketing

REGED: A genomics dataset

Contact: Isabelle Guyon - Submitted: 2008-09-12 02:55 - Views : 2003

This is one of the datasets of the first causality challenge: causation and prediction. The goal of the challenge was to make predictions under manipulations. REGED (REsimulated Gene Expression Dataset) monitors the expression of genes, which...  [more/question/discuss/rate/edit...]

  • Authors: Causality workbench team
  • Key facts: Number of variables: 999 (gene expression coefficients) + one binary target variable (health status). Number of examples: training 500 + 3 test sets of 20000...
  • Keywords: bayesian.network, genomics

MARTI: Measurement Artifacts

Contact: Isabelle Guyon - Submitted: 2008-09-12 06:02 - Views : 2084

This is one of the datasets of the first causality challenge: causation and prediction. The goal of the challenge was to make predictions under manipulations. MARTI (Measurement ARTIfact) is obtained from the same data generative process as...  [more/question/discuss/rate/edit...]

WebLogs: Causal discovery in web logs

Contact: Cristian Grozea - Submitted: 2008-12-07 02:36 - Views : 1724

From real data, the anonymized logs of a web server, determine the causal structure - which pages link/lead to visits of other pages. The ground truth is beyond doubt, from the referrer information, but this information will be kept for an...  [more/question/discuss/rate/edit...]

  • Authors: Cristian Grozea
  • Key facts: Number of variables: 20 (daily hits for web pages); Number of instances: 512 (training set). Ascii format for input; Matlab format allowed for output.
  • Keywords: web_logs, probabilistic

CauseEffectPairs: Distinguishing between cause and effect

Contact: Dominik Janzing - Submitted: 2010-05-04 13:53 - Views : 2635

The data set consists of 8 N x 2 matrices, each representing a cause-effect pair and the task is to identify which variable is the cause and which one the effect. The origin of the data is hidden for the participants but known to the organizers....  [more/question/discuss/rate/edit...]

STEMMATOLOGY: Computer-assisted stemmatology

Contact: Teemu Roos - Submitted: 2008-10-29 09:18 - Views : 2026

Stemmatology (a.k.a. stemmatics) studies relations among different variants of a document that have been gradually built from an original text by copying and modifying earlier versions. The aim of such study is to reconstruct the family tree (causal...  [more/question/discuss/rate/edit...]

MIDS: MIxed Dynamic Systems

Contact: Denver Dash - Submitted: 2008-11-23 06:12 - Views : 2601

Summary: This data represents a 9 variable (labeled X1...X9) dynamic system with several dynamic processes acting on qualitatively different time scales from one another. The goal is to learn a causal model of the system with the training data, and...  [more/question/discuss/rate/edit...]

NOISE: Causal Directions in Noisy Environment

Contact: Guido Nolte - Submitted: 2009-10-05 21:17 - Views : 1974

This challenge has two parts, a simulation and real data. Simulation: Data are simulated as superposition of bivariate unidirectional interaction plus additive mixed and non-white noise. The simulations were done with AR-models with...  [more/question/discuss/rate/edit...]

  • Authors: G. Nolte
  • Key facts: Simulated Data: 1000 examples of bivariate time series' for 6000 time points each. Real Data: EEG data of 10 subjects measured at rest with eyes closed....
  • Keywords: Time series, mixed noise, bivariate, EEG
Rating : (1 vote)

SECOM: SEmi COnductor Manufacturing process control data

Contact: Michael McCann - Submitted: 2008-11-19 18:55 - Views : 3285

Abstract: A complex modern semi-conductor manufacturing process is normally under consistent surveillance via the monitoring of signals/variables collected from sensors and or process measurement points. However, not all of these signals are equally...  [more/question/discuss/rate/edit...]

SETFI: Manufacturing data: Semiconductor Tool Fault Isolation

Contact: Eugene Tuv - Submitted: 2008-11-24 23:46 - Views : 3262

During the semiconductor fabrication process each wafer goes through a product specific sequence of operations (hundreds) in batches - lots. Every lot goes through each operation in the sequence. At each operation a lot could go through only one of...  [more/question/discuss/rate/edit...]

Rating : (1 vote)

WearableAccelerometersDataset: Wearable Computing: Classification of Body Postures and Movements (PUC-Rio) Data Set

Contact: Ugulino - Submitted: 2013-07-30 03:58 - Views : 989

During the last 5 years, research on Human Activity Recognition (HAR) has reported on systems showing good overall recognition performance. As a consequence, HAR has been considered as a potential technology for e-health systems. Here, we propose a...  [more/question/discuss/rate/edit...]