Monday, 5 july 2021
Causal discovery between time series
This thesis aims to give a broad coverage of central concepts and principles of causation and in particular the ones involved in the emerging approaches to causal discovery from time series.
After reviewing concepts and algorithms, we first present a new approach that infers a summary graph of the causal system underlying the observational time series while relaxing the idealized setting of equal sampling rates and discuss the assumptions underlying its validity. The gist of our proposal lies in the introduction of the causal temporal mutual information measure that can detect the independence and the conditional independence between two-time series, and in making an apparent connection between entropy and the probability raising principle that can be used for building new rules for the orientation of the direction of causation. Moreover, through the development of this base method, we propose several extensions, namely to handle hidden confounders, to infer a window causal graph given a summary graph, and to consider sequences instead of time series.
Secondly, we focus on the discovery of causal relations from a statistical distribution that is not entirely faithful to the real causal graph and on distinguishing a common cause from an intermediate cause even in the absence of a time indicator. The key aspect of our answer to this problem is the reliance on the additive noise principle to infer a directed supergraph that contains the causal graph. To converge toward the causal graph, we use in a second step a new measure called the temporal causation entropy that prunes for each node of the directed supergraph, the parents that are conditionally independent of their child. Furthermore, we explore complementary extensions of our second base method that involve a pairwise strategy which reduces through multitask learning and a denoising technique, the number of functions that need to be estimated. 
We perform an extensive experimental comparison of the proposed algorithms on both synthetic and real datasets and demonstrate their promising practical performance: gaining in time complexity while preserving accuracy.
Mis à jour le 29 June 2021