The PROMO dataset proposes the task to identify which promotions affect sales. Artificial data about 1000 promotion variables and 100 product sales is provided. The goal is to predict a 1000x100 boolean influence matrix, indicating for each (i,j) element whether the ith promotion has a causal influence of the sales of the jth product. Data is provided as time series, with a daily value for each variable for three years (i.e., 1095 days).
Each of the 100 products has a defined seasonal baseline, repeating over the years. The seasonal effect can vary from almost inexistent to major. On top of this baseline are promotions. Each product is influenced by between 1 and 50 promotions out of the 1000 promotions available. Promotions usually increase the sales with respect to the baseline, but can occasionally reduce them (e.g., when a similar competing product is promoted, that promotion might have a negative effect on the sales of the current product). On top of that are daily variations.
Each of the 1000 promotions can be seasonal or not; i.e., they can have the same pattern from one year to another or be completely different. The average time a promotion stays active or inactive, however, is constant for each promotion.
The weighted normalized influence matrix is provided for result evaluation. It is normalized so that the maximum positive contribution is 1 and the maximum negative contribution is -1, and each nonzero (i,j) entry is weighted by how much promotion i affects product j. Algorithms can be requested to output either a boolean influence matrix, or a weighted matrix similar to the one provided for result evaluation.
| #1 | Isabelle Guyon | 2008-09-15 19:04:32 | - |
Hi Jean-Philippe,
Can you provide Matlab code for rating the results?
Isabelle
| #2 | Jianxin Yin | 2008-09-18 19:03:19 | - |
hi Jean-Philippe,
I have two questions:
(1). Is your influence matrix normalized along each row or along each column?
(2). Do you suppose that the products have causal effect on each other? Or they are only affected by the promotions?
Thanks.
| #3 | Nicole Kraemer | 2008-10-01 11:09:32 | - |
Dear Jean-Philippe,
apparently, some of the promotions never took place (e.g. column 257 of promotions is 0). According to the influence matrix, they have a causal effect on some of the products (e.g. promotion 257 has an effect on product 25).
Is this a bug?
Best,
Nicole
| #4 | Jean-Philippe Pellet | 2008-10-03 11:47:00 | - |
Dear Nicole,
Thanks for pointing this out. This is indeed a bug. I will correct it and update a new version of the dataset, as well as some code to rate the results, as asked by Isabelle.
Best,
J.-P.
| #5 | Jean-Philippe Pellet | 2008-10-03 11:47:36 | - |
Dear Jianxin,
(1) The influence matrix is determined in a way such that all nonzero weights are between -1 and 1, so actually it has not "been normalized" but is generated like that. So in a way it is normalized along both rows and columns.
(2) The products themselves have no causal effect on each other.
Best,
J.-P.
| #6 | Jean-Philippe Pellet | 2008-10-09 18:09:39 | In reply to message #3 |
Dear Nicole,
After discussing this issues with my colleagues, we thought that it was not bad to keep the dataset as is. The reason is that with real data, it is not rare to get promotion data where some promotions are always on or always off: in spite of this, the promotions that are always on may have a causal effect; similarly, the ones that are always off might have had a causal effect if they had been on. We are then unable to assess that causal effect, but then every participant basically has the same limitations and the rules stay fair, even with this additional (realistic) problem of not being able to assess the causal effects of promotions that are always on or always off.
Best,
J.-P.
| #7 | Eugene Tuv | 2008-10-15 08:30:59 | - |
Hi Jean-Philippe,
2 questions:
1) is the reported subset of promotions optimal (max information, minimal non-redundant set )? if yes - is it unique minimal set? 2) can your ranking be derived from the data? any info on your ranking method?
thanks,
-eugene
| #8 | Jean-Philippe Pellet | 2008-10-29 16:47:27 | - |
Dear Eugene,
1. The reported subset is not guaranteed to be optimal (especially taking into account the problem previously described with Nicole). This is done so that it mirrors the conditions we can find in real-world problems
2. In order to know your ranking, you'll have to wait until the final results and comparison among the participants has been established...
Best,
J.-P.