This dataset consists of roughly 700 to 900 single cell recordings of the abundance of 11 phosphoproteins and phospholipids (PKC, PKA, P38, Jnk (pjnk), Raf (praf), Mek (pmek), Erk (p44/42), Akt (pakts473), PLC-gamma (plcg), PIP2, PIP3) under various experimental conditions in human primary naive CD4+T cells, primarily downstream of CD3 and CD28 activation. Conceptually, the goal is to unravel protein signalling networks, originally modeled as causal Bayesian networks. The various experimental conditions constitute "interventions" on the system of interest. There is no recording of the "unmanipulated" or "natural" distribution, because the system's tendency is to stay in an 'off' state in the absence of perturbations. Instead, a "general stimulus" or "general perturbation" is applied to the cell to activate the pathway of interest, using upstream key receptors: CD3, CD28, and LFA-1. These are combined with specific inhibitors (e.g. Akt-inhibitor, Mek inhibitor, etc) which directly inhibit the ACTIVITY of the target protein. (This is true for all the inhibitors except Psitect, which is instead an ABUNDANCE inhibitor). The specific perturbations (provided by the inhibitors) allows for elucidation of some causal interactions. In total, nine different conditions were applied to sets of individual cells, seven downstream of CD3, CD28 and/or ICAM2, and 2 using different, specific pathway activators (activating either PKA or PKC). The nine conditions are listed below. We show the file names in parenthesis:
1) General perturbation (GP1): anti-CD3 + anti-CD28 (cd3cd28.xls), 854 cells
2) General perturbation: GP1 + ICAM-2 (that induces LFA-1) (cd3cd28icam2.xls), 903 cells
3) AKT inactivation: GP1 + Akt inhibitor (cd3cd28+aktinhib.xls), 912 cells
4) PKC inhibition: GP1 + G06976 (cd3cd28+g0076.xls), 724 cells
5) PIP2 inhibition: GP1 + psitectorigenin (cd3cd28+psitect.xls), 811 cells
6) Mek inhibition (can affect Erk abundance because Erk is downstream of Mek): GP1 + U0126 (cd3cd28+u0126.xls), 800 cells
7) Akt inhibition: GP1 + LY294002 (cd3cd28+ly.xls), 849 cells
8) PKC activation: phorbol 12-myristate 13-acetate (pma.xls), 914 cells
9) PKA activation : b2 cyclic adenosine 3',5'-monophosphate (b2camp.xls), 708 cells
The technique should be contrasted with lysate techniques, which record average activities of many cells. This dataset is unique in that it provides a statistically large dataset amenable to statistical studies of the recording of many single cells.
Methods: intracellular multicolor flow cytometry providing quantitative simultaneous observations of multiple signaling molecules in many individual cells. Flow cytometry can be used to quantitatively measure a given protein's abundance level and can also include measures of proteinmodification states such as phosphorylation (in this dataset, all reported quantities are for the phosphoform). Because each cell is treated as an independent observation, flow cytometric data provide a statistically large sample that could enable Bayesian network inference to accurately predict pathway structure.
- Using a technique of your choice and all available data, infer the causal relationships between variables and compare the resulting causal network to the ground truth of Figure 2 and the inferred network of Figure 3. (Note that as in all biological pathways, the 'ground truth' pathway is likely inaccurate and/or incomplete. Comparisons to ground truth should be done with this in mind).
- Using a technique of your choice and only subsets of the data (e.g. only the general perturbation data cd3cd28.xls and cd3cd28icam2.xls) repeat the causal discovery experiments.
- Find a method to assess the confidence of the causal relationships uncovered.
- In the data, in addition to the 9 conditions described above, there are five simulated western blot conditions. Devise experiments making use of these additional data.
|#1||Jin Tian||2008-10-13 16:52:37||-|
Can you provide discretized data?
|#2||Karen Sachs||2008-10-13 17:11:23||In reply to message #1|
Currently it's not in a convenient format (it's in a large super-matrix with all conditions together, and not in binary matlab-format). I'll try to put it into a better format but it should be very easy to discretize yourself: the disc. code is readily available from Alex Hartemink's thesis, as referenced in our Methods section.
|#3||Akshay Deepak||2008-10-23 22:21:52||In reply to message #2|
Should the data set files be individually discretized or all data should be merged together, discretized and then put back to respective sheets.
|#5||Isabelle Guyon||2018-04-23 20:53:45||In reply to message #4|
If you click on the title, you get to a page where the data can be downloaded. I am trying to fix the broken link.