Causality Causality Workbench                                                             Challenges in Machine Learning Causality
[Back to list]


Causal discovery in web logs

Contact: Cristian Grozea - Submitted: 2008-12-07 02:36 - Views : 3355 - [Edit entry]


From real data, the anonymized logs of a web server,
determine the causal structure - which pages link/lead to visits of other pages.
The ground truth is beyond doubt, from the referrer information, but this information will be kept for an objective evaluation.
Trends towards privacy and its relation to electronic data storage motivate this problem.

Data format - Input:
A matrix of 512 days by 20 pages containing integer numbers, the frequency of the visits during that day.
The calendar dates are also given for the ones that need them.

Data format - Output:
The matrix of 20 by 20 numbers having on the position (u,v) the probability that a visit of the page 'u' causes a visit of the page 'v'.
Thus, 1 means 100% causal implication (deterministic, each visit of the page 'u' causes a visit of the page 'v'), while 0 means no causal implication of the visits of page 'u' on the visits of page 'v'.

As we do have the ground truth, we will compute the correlation between the given arc strengths and the measured transition probability on an hold-out dataset of the same size.

Comments / Questions / Answers

None yet.

Your comment / question:

You must be registered in order to post comments/questions.
Password: Forgot your password ?
Rate the dataset: No rating    0 1 2 3 4 5   (Only counts once, will update if changed)
Receive e-mail when new posts are made