Causality Causality Workbench                                                             Challenges in Machine Learning Causality
[Back to list]


Simple causal effects in time series

Contact: Jean-Philippe Pellet - Submitted: 2011-01-26 17:59 - Views : 10071 - [Edit entry]
  • Authors: Causality workbench team
  • Key facts: This dataset contains artificial data about product sales and promotions as time series. There are 1000 binary promotions variables and 100 continuous product sales variables. The goal is to predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product.
  • Keywords: time.series, structural.equation.models
  • Download BibTeX
  • Download the data


The PROMO dataset proposes the task to identify which promotions affect sales. Artificial data about 1000 promotion variables and 100 product sales is provided. The goal is to predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product. Data is provided as time series, with a daily value for each variable for three years (i.e., 1095 days).

Each of the 100 products has a defined seasonal baseline, repeating over the years. The seasonal effect can vary from almost inexistent to major. On top of this baseline are promotions. Each product is influenced by between 1 and 50 promotions out of the 1000 promotions available. Promotions usually increase the sales with respect to the baseline, but can occasionally reduce them (e.g., when a similar competing product is promoted, that promotion might have a negative effect on the sales of the current product). On top of that are daily variations.

Each of the 1000 promotions can be seasonal or not; i.e., they can have the same pattern from one year to another or be completely different. The average time a promotion stays active or inactive, however, is constant for each promotion.

The weighted normalized influence matrix is provided for result evaluation. It is normalized so that the maximum positive contribution is 1 and the maximum negative contribution is -1, and each nonzero (i,j) entry is weighted by how much promotion i affects product j. Algorithms can be requested to output either a boolean influence matrix, or a weighted matrix similar to the one provided for result evaluation.

Comments / Questions / Answers

#1 Isabelle Guyon 2008-09-15 19:04:32 -

Hi Jean-Philippe,

Can you provide Matlab code for rating the results?


Reply to this post
#2 Jianxin Yin 2008-09-18 19:03:19 -

hi Jean-Philippe,
I have two questions:
(1). Is your influence matrix normalized along each row or along each column?
(2). Do you suppose that the products have causal effect on each other? Or they are only affected by the promotions?

Reply to this post
#3 Nicole Kraemer 2008-10-01 11:09:32 -

Dear Jean-Philippe,

apparently, some of the promotions never took place (e.g. column 257 of promotions is 0). According to the influence matrix, they have a causal effect on some of the products (e.g. promotion 257 has an effect on product 25).

Is this a bug?



Reply to this post
#4 Jean-Philippe Pellet 2008-10-03 11:47:00 -

Dear Nicole,

Thanks for pointing this out. This is indeed a bug. I will correct it and update a new version of the dataset, as well as some code to rate the results, as asked by Isabelle.


Reply to this post
#5 Jean-Philippe Pellet 2008-10-03 11:47:36 -

Dear Jianxin,

(1) The influence matrix is determined in a way such that all nonzero weights are between -1 and 1, so actually it has not "been normalized" but is generated like that. So in a way it is normalized along both rows and columns.

(2) The products themselves have no causal effect on each other.


Reply to this post
#6 Jean-Philippe Pellet 2008-10-09 18:09:39 In reply to message #3

Dear Nicole,

After discussing this issues with my colleagues, we thought that it was not bad to keep the dataset as is. The reason is that with real data, it is not rare to get promotion data where some promotions are always on or always off: in spite of this, the promotions that are always on may have a causal effect; similarly, the ones that are always off might have had a causal effect if they had been on. We are then unable to assess that causal effect, but then every participant basically has the same limitations and the rules stay fair, even with this additional (realistic) problem of not being able to assess the causal effects of promotions that are always on or always off.


Reply to this post
#7 Eugene Tuv 2008-10-15 08:30:59 -

Hi Jean-Philippe,
2 questions:
1) is the reported subset of promotions optimal (max information, minimal non-redundant set )? if yes - is it unique minimal set? 2) can your ranking be derived from the data? any info on your ranking method?

Reply to this post
#8 Jean-Philippe Pellet 2008-10-29 16:47:27 -

Dear Eugene,

1. The reported subset is not guaranteed to be optimal (especially taking into account the problem previously described with Nicole). This is done so that it mirrors the conditions we can find in real-world problems

2. In order to know your ranking, you'll have to wait until the final results and comparison among the participants has been established...


Reply to this post

Your comment / question:

You must be registered in order to post comments/questions.
Password: Forgot your password ?
Rate the dataset: No rating    0 1 2 3 4 5   (Only counts once, will update if changed)
Receive e-mail when new posts are made