============================================================================= *** ChaLearn First Connectomics Challenge: Datasets *** http://connectomics.chalearn.org/ *** Version 1 - January 2014 ============================================================================= ALL INFORMATION, SOFTWARE, DOCUMENTATION, AND DATA ARE PROVIDED "AS-IS". CHALEARN, KAGGLE AND/OR OTHER ORGANIZERS DISCLAIM ANY EXPRESSED OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR ANY PARTICULAR PURPOSE, AND THE WARRANTY OF NON-INFRIGEMENT OF ANY THIRD PARTY'S INTELLECTUAL PROPERTY RIGHTS. IN NO EVENT SHALL THE ORGANIZERS BE LIABLE FOR ANY SPECIAL, INDIRECT OR CONSEQUENTIAL DAMAGES OR ANY DAMAGES WHATSOEVER ARISING OUT OF OR IN CONNECTION WITH THE USE OR PERFORMANCE OF SOFTWARE, DOCUMENTS, MATERIALS, PUBLICATIONS, OR INFORMATION MADE AVAILABLE FOR THE CHALLENGE. ============================================================================= The goal of the challenge is to predict whether there is a (directed) connection between neuron i and neuron j in a network of 1000 neurons (self-connections permitted). We provide one hour of recording of the activity of all neurons as time series and the position of the neurons (arranged on a flat surface). The data, which are simulated, reproduce as faithfully as possible neural activity measured with calcium fluorescence imaging of neural cultures. The challenge dataset consists of 13 compressed archives, each including three files: 1) "fluorescence" file: This is the time series of neural activities obtained from fluorescence signals. The neurons are in columns and the rows are time ordered samples. The signals are sampled at 20ms intervals. 2) "networkPosition" file: Each row represents a neuron. First column = X position; second column = Y position. The neurons span a 1mm2 square area. 3) "network" file: Each row is a connection. The column structure is of the form I,J,W denoting a connection from neuron I to neuron J with weight W. Connections with weight -1 are "blocked" in the simulations (simulating a chemical blockage), so you should consider them as absent. ** EXCEPT FOR THE VALIDATION ("VALID") AND TEST DATA ARCHIVES, FOR WHICH THE NETWORK FILES ARE OMITTED (because this is what you have to predict!) ** 1. valid.tgz Fluorescence and positional data for the development phase of the challenge (so-called "validation" or "valid" data). When results are submitted, the performance is provided immediately on-line for the validation data. N=1000 neurons. 2. test.tgz Fluorescence and positional data for the FINAL test phase of the challenge (counting towards the final ranking and the prizes). N=1000 neurons. 3. small.tgz Six small networks with N=100 neurons. Each network has the same connectivity degree but different levels of clustering coefficient, intended for fast checks of the algorithms. 4. normal-1.tgz Network similar to the test and validation networks. N=1000 neurons. 5. normal-2.tgz Network similar to the test and validation networks. N=1000 neurons. 6. normal-3.tgz Network similar to the test and validation networks. N=1000 neurons. 7. normal-3-highrate.tgz Same as network-3 but with highly active neurons, i.e., higher firing frequency. 8. normal-4.tgz Network similar to the test and validation networks. N=1000 neurons. 9. normal-4-lownoise.tgz Same architecture as network-4 (and same spiking data) but with a Fluorescence signal with a much better signal to noise ratio. 10. highcc.tgz Network equivalent to the test and validation one but with a higher clustering coefficient on average. 11. lowcc.tgz Network equivalent to the test and validation one but with a lower clustering coefficient on average. 12. highcon.tgz Network equivalent to the test and validation one but with a higher number of connections per neuron on average. 13. lowcon.tgz Network equivalent to the test and validation one but with a lower number of connections per neuron on average.