Causality Causality Workbench                                                             Challenges in Machine Learning Causality
[Back to list]
 

SIGNET

Abscisic Acid Signaling Network

Contact: Jerry Jenkins - Submitted: 2008-11-25 20:56 - Views : 6670 - [Edit entry]

Abstract:

The objective is to determine the set of boolean rules that describe the interactions of the nodes within this plant signaling network. The dataset includes 300 separate boolean pseudodynamic simulations of the true rules, using an asynchronous update scheme. Each of the 300 simulations begin with a randomly generated initial condition, in order to ensure sampling of all of the steady states of the system. There are a total of 43 nodes in this dataset, with 5 nodes being constants.
The results for 300 separate simulations are included in the dataset. Each simulation consists of a matrix of 0's and 1's, with 21 rows and 43 columns. The first row is the randomly generated initial condition for the particular simulation, with the next 20 rows being the output from the boolean pseudodynamics simulation. Each of the 43 columns represent the transient response of a particular node. The nodal names are identified at the top of the data file. A line of asterisks is used to separate the simulations from one another. An example set of data is included below:
***************************
1011101110101101101101001010001011000011001
1100001110111101101101111111011001011101011
1100011110111110101101100011010001110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010
1100001110111110101101100011000011110101010


Suggested task: Uncover the 43 boolean rules x_i = f(x_1, x_2, ... x_43) of the Boolean Network.

We suggest to report results in disjunctive normal form (DNF), see, e.g., http://en.wikipedia.org/wiki/Disjunctive_normal_form, denoting the Boolean operators as "or", "and", and "not" and using regular parentheses.
Example:
ABI = (pH and not PA and not ROS) or (ABA and Ca)
One way to obtain these DNF formulae is to generate truth tables, then use a program like Minilog http://en.wikipedia.org/wiki/Minilog to generate the formula.

We now provide the truth values of the Boolean rules for self evaluation:

NO = NIA12 and NOS
PLC = ABA and Ca
CAIM = ( ROS or not ERA1 or not ABH1 ) and not DEPOLAR
GPA = ( S1P or not GCR ) and AGB
ATRBOH = PH and OST and ROP2 and not ABI
HATPase = not ROS and not PH and not Ca
MALATE = PEPC and not ABA and not AnionEM
RAC = not ABA and not ABI
Actin = Ca or not RAC
ROS = ABA and PA and PH
ABI = PH and not PA and not ROS
KAP = ( not PH and not Ca ) and DEPOLAR
Ca = ( CAIM or CIS ) and not CaATPase
CIS = ( cGMP and cADPR ) or ( IP3 and IP6 )
AnionEM = ( ( Ca or PH ) and not ABI ) or ( Ca and PH )
KOUT = ( PH or not ROS or not NO ) and DEPOLAR
DEPOLAR = KEV or AnionEM or not HATPase or not KOUT or Ca
CLOSURE = ( KOUT or KAP ) and AnionEM and Actin and not MALATE
ABA = 1
ABH1 = 1
AGB = 1
ERA1 = 1
GCR = 1
ADPRc = NO
CaATPase = Ca
cADPR = ADPRc
cGMP = GC
GC = NO
InsPK = ABA
IP3 = PLC
IP6 = InsPK
KEV = Ca
NIA12 = RCN
NOS = Ca
OST = ABA
PA = PLD
PEPC = not ABA
PH = ABA
PLD = GPA
RCN = ABA
ROP2 = PA
S1P = SPHK
SPHK = ABA

For evaluation, we suggest that, for each true generative rule, you generate the truth table, and compute the prediction error rate by comparing the predictions made by the rule of the proposed model to the target values. Then average the error rates over all rules. This measure does not respect the "natural" distribution of states, but this may be a feature rather than a bug because, for causal models, one wants to be robust agains changes in distribution.

We provide some Matlab code to score the results and eventually generate new data (see http://www.causality.inf.ethz.ch/data/@signet.zip):

==> Usage for scoring:

s=read_rules(signet, 'your_submission_file.txt');
err=compare_rules(s);

Here is how it works:
- for each rule "zozo = some_boolean_expr(some_variables)"
* pool together the variables in the true rule for zozo and the propose rule
* create input vectors for all possible assignments of values to these variables
* apply the true rule to each input vector to get the target variables T
* apply the proposed rule to get the predicted Y
* Compute the error rate (fraction of disagreements between Y and T)
- average the error rates over all rules.

==> Usage for generating data:

dat=gene(signet, num, v_ini);

v_ini = initial state (43 binary values)
num = number of time steps
Returns a data matrix.


Submit results to causality [at] clopinet [dot] com

Comments / Questions / Answers

#1 Mehreen Saeed 2008-09-18 09:06:43 -

Are the variables in the boolean rule time dependent? So do time steps have to be taken into account when formulating a rule, for example
ABI (at time step t) = (pH at time step (t-1)) or (pH at time step (t-2))

Reply to this post
#2 Jerry Jenkins 2008-09-22 20:12:14 In reply to message #1

The variables are only dependent on the values from the previous time step { (t-1) in your notation}. The rule that you have presented would not occur in this network.

Reply to this post
#3 Isabelle Guyon 2008-09-25 00:41:34 In reply to message #1

A few more precisions (from the PLOS paper):

The simulations are made using asynchrounous updates. Every node is updated exactly once during each unit time interval, according to a given order. This order is a permutation of the N nodes in the network, chosen randomly out of a uniform distribution over the set of all N! possible permutations. A new update order is selected at each timestep.

Reply to this post
#4 Isabelle Guyon 2008-09-25 20:08:13 -

Erratum: The original file containing the 300 simulations was truncated. It has now been replaced. Please rely on the data file found on this web site for training your models: http://www.causality.inf.ethz.ch/data/SIGNET.zip, not the one posted on the UCI repository. We will correct this other entry as soon as possible.

Reply to this post
#5 Mehreen Saeed 2008-10-28 08:15:19 -

Is there a routine available for coverting a boolen expression to regular DNF?

Reply to this post
#6 Isabelle Guyon 2008-10-28 16:34:04 In reply to message #5

One way to do it is to generate the truth table and then use a program like Minilog http://en.wikipedia.org/wiki/Minilog to generate the formula.

Reply to this post
#7 Mehreen Saeed 2008-11-14 09:40:20 -

Some rules cannot be inferred from the simulation data. For example the rule involving ABA:

MALATE = PEPC and not ABA and not AnionEM

The value of ABA variable in the simulation data is always a one and hence the value of malate will always be zero, hence implying that this variable is always a constant.

Maybe it would help to have more simulation data provided for rule extraction

Reply to this post
#8 Isabelle Guyon 2008-11-17 19:03:03 In reply to message #7

Please provide the results for the original dataset and explain the problem. You may then also generate new data and provide, for instance, an evolution of performance as a function of data set size. There is a function to generate data from the Matlab code provided. See the updated dataset description.

Reply to this post
#9 Mehreen Saeed 2008-11-21 08:22:38 -

A change should be made to the evaluation system for rules that involve constants. For example consider the following two rules given on the website:
ABA = 1
PH = ABA
Our method detects the rule: PH = 1, which is correct. However, in the evaluation of this rule it generates two possible values of ABA, hence giving us 50% accuracy (whereas it is 100%). This is true for a number of rules of the system.

A change in the actual rules should be made, replacing the two constants ABA and AGB with constant (1) value. The rules with other constants shouldn't necessarily be changed as their values appear as zero or one in the initial simulation vector.

Reply to this post

Your comment / question:

You must be registered in order to post comments/questions.
Email:
Password: Forgot your password ?
Rate the dataset: No rating    0 1 2 3 4 5   (Only counts once, will update if changed)
Comments:
Receive e-mail when new posts are made