[Back to list]

### Comments / Questions / Answers

Contact: Jerry Jenkins - Submitted: 2008-11-25 20:56 - Views : 8559 - [Edit entry]
### Abstract:

**Authors:**Jerry W. Jenkins, Abhishek Soni**Key facts:**Simulated data with a Boolean network modeling a biological signaling network.

Time series of 21 time steps. Initial step randomly drawn.

Number of variables: 43.

Number of entries: 300 separate dynamic simulations.

Variable types: binary.

Missing data: No.

During simulation, 38 of the 43 nodes are allowed to vary, with 5 nodes held constant throughout the simulation.**Keywords:**boolean.network, signaling.network, time.series- Download BibTeX
- Download the data
- Li S, Assman SM, Albert R (2006) Predicting essential components of signal transduction networks: a dynamic model of guard cell abscisic acid signaling. Plos Biology 4: p. 1732-1748

The objective is to determine the set of boolean rules that describe the interactions of the nodes within this plant signaling network. The dataset includes 300 separate boolean pseudodynamic simulations of the true rules, using an asynchronous update scheme. Each of the 300 simulations begin with a randomly generated initial condition, in order to ensure sampling of all of the steady states of the system. There are a total of 43 nodes in this dataset, with 5 nodes being constants.

The results for 300 separate simulations are included in the dataset. Each simulation consists of a matrix of 0's and 1's, with 21 rows and 43 columns. The first row is the randomly generated initial condition for the particular simulation, with the next 20 rows being the output from the boolean pseudodynamics simulation. Each of the 43 columns represent the transient response of a particular node. The nodal names are identified at the top of the data file. A line of asterisks is used to separate the simulations from one another. An example set of data is included below:

***************************

1011101110101101101101001010001011000011001

1100001110111101101101111111011001011101011

1100011110111110101101100011010001110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

1100001110111110101101100011000011110101010

Suggested task: Uncover the 43 boolean rules x_i = f(x_1, x_2, ... x_43) of the Boolean Network.

We suggest to report results in disjunctive normal form (DNF), see, e.g., http://en.wikipedia.org/wiki/Disjunctive_normal_form, denoting the Boolean operators as "or", "and", and "not" and using regular parentheses.

Example:

ABI = (pH and not PA and not ROS) or (ABA and Ca)

One way to obtain these DNF formulae is to generate truth tables, then use a program like Minilog http://en.wikipedia.org/wiki/Minilog to generate the formula.

We now provide the truth values of the Boolean rules for self evaluation:

NO = NIA12 and NOS

PLC = ABA and Ca

CAIM = ( ROS or not ERA1 or not ABH1 ) and not DEPOLAR

GPA = ( S1P or not GCR ) and AGB

ATRBOH = PH and OST and ROP2 and not ABI

HATPase = not ROS and not PH and not Ca

MALATE = PEPC and not ABA and not AnionEM

RAC = not ABA and not ABI

Actin = Ca or not RAC

ROS = ABA and PA and PH

ABI = PH and not PA and not ROS

KAP = ( not PH and not Ca ) and DEPOLAR

Ca = ( CAIM or CIS ) and not CaATPase

CIS = ( cGMP and cADPR ) or ( IP3 and IP6 )

AnionEM = ( ( Ca or PH ) and not ABI ) or ( Ca and PH )

KOUT = ( PH or not ROS or not NO ) and DEPOLAR

DEPOLAR = KEV or AnionEM or not HATPase or not KOUT or Ca

CLOSURE = ( KOUT or KAP ) and AnionEM and Actin and not MALATE

ABA = 1

ABH1 = 1

AGB = 1

ERA1 = 1

GCR = 1

ADPRc = NO

CaATPase = Ca

cADPR = ADPRc

cGMP = GC

GC = NO

InsPK = ABA

IP3 = PLC

IP6 = InsPK

KEV = Ca

NIA12 = RCN

NOS = Ca

OST = ABA

PA = PLD

PEPC = not ABA

PH = ABA

PLD = GPA

RCN = ABA

ROP2 = PA

S1P = SPHK

SPHK = ABA

For evaluation, we suggest that, for each true generative rule, you generate the truth table, and compute the prediction error rate by comparing the predictions made by the rule of the proposed model to the target values. Then average the error rates over all rules. This measure does not respect the "natural" distribution of states, but this may be a feature rather than a bug because, for causal models, one wants to be robust agains changes in distribution.

We provide some Matlab code to score the results and eventually generate new data (see http://www.causality.inf.ethz.ch/data/@signet.zip):

==> Usage for scoring:

s=read_rules(signet, 'your_submission_file.txt');

err=compare_rules(s);

Here is how it works:

- for each rule "zozo = some_boolean_expr(some_variables)"

* pool together the variables in the true rule for zozo and the propose rule

* create input vectors for all possible assignments of values to these variables

* apply the true rule to each input vector to get the target variables T

* apply the proposed rule to get the predicted Y

* Compute the error rate (fraction of disagreements between Y and T)

- average the error rates over all rules.

==> Usage for generating data:

dat=gene(signet, num, v_ini);

v_ini = initial state (43 binary values)

num = number of time steps

Returns a data matrix.

Submit results to causality [at] clopinet [dot] com

#1 | Mehreen Saeed | 2008-09-18 09:06:43 | - |

Are the variables in the boolean rule time dependent? So do time steps have to be taken into account when formulating a rule, for example

ABI (at time step t) = (pH at time step (t-1)) or (pH at time step (t-2))

#2 | Jerry Jenkins | 2008-09-22 20:12:14 | In reply to message #1 |

The variables are only dependent on the values from the previous time step { (t-1) in your notation}. The rule that you have presented would not occur in this network.

#3 | Isabelle Guyon | 2008-09-25 00:41:34 | In reply to message #1 |

A few more precisions (from the PLOS paper):

The simulations are made using asynchrounous updates. Every node is updated exactly once during each unit time interval, according to a given order. This order is a permutation of the N nodes in the network, chosen randomly out of a uniform distribution over the set of all N! possible permutations. A new update order is selected at each timestep.

#4 | Isabelle Guyon | 2008-09-25 20:08:13 | - |

Erratum: The original file containing the 300 simulations was truncated. It has now been replaced. Please rely on the data file found on this web site for training your models: http://www.causality.inf.ethz.ch/data/SIGNET.zip, not the one posted on the UCI repository. We will correct this other entry as soon as possible.

#5 | Mehreen Saeed | 2008-10-28 08:15:19 | - |

Is there a routine available for coverting a boolen expression to regular DNF?

#6 | Isabelle Guyon | 2008-10-28 16:34:04 | In reply to message #5 |

One way to do it is to generate the truth table and then use a program like Minilog http://en.wikipedia.org/wiki/Minilog to generate the formula.

#7 | Mehreen Saeed | 2008-11-14 09:40:20 | - |

Some rules cannot be inferred from the simulation data. For example the rule involving ABA:

MALATE = PEPC and not ABA and not AnionEM

The value of ABA variable in the simulation data is always a one and hence the value of malate will always be zero, hence implying that this variable is always a constant.

Maybe it would help to have more simulation data provided for rule extraction

#8 | Isabelle Guyon | 2008-11-17 19:03:03 | In reply to message #7 |

Please provide the results for the original dataset and explain the problem. You may then also generate new data and provide, for instance, an evolution of performance as a function of data set size. There is a function to generate data from the Matlab code provided. See the updated dataset description.

#9 | Mehreen Saeed | 2008-11-21 08:22:38 | - |

A change should be made to the evaluation system for rules that involve constants. For example consider the following two rules given on the website:

ABA = 1

PH = ABA

Our method detects the rule: PH = 1, which is correct. However, in the evaluation of this rule it generates two possible values of ABA, hence giving us 50% accuracy (whereas it is 100%). This is true for a number of rules of the system.

A change in the actual rules should be made, replacing the two constants ABA and AGB with constant (1) value. The rules with other constants shouldn't necessarily be changed as their values appear as zero or one in the initial simulation vector.