The Annals of Statistics

Learning high-dimensional directed acyclic graphs with latent and selection variables

Diego Colombo, Marloes H. Maathuis, Markus Kalisch, and Thomas S. Richardson

Full-text: Open access

Abstract

We consider the problem of learning causal information between random variables in directed acyclic graphs (DAGs) when allowing arbitrarily many latent and selection variables. The FCI (Fast Causal Inference) algorithm has been explicitly designed to infer conditional independence and causal information in such settings. However, FCI is computationally infeasible for large graphs. We therefore propose the new RFCI algorithm, which is much faster than FCI. In some situations the output of RFCI is slightly less informative, in particular with respect to conditional independence information. However, we prove that any causal information in the output of RFCI is correct in the asymptotic limit. We also define a class of graphs on which the outputs of FCI and RFCI are identical. We prove consistency of FCI and RFCI in sparse high-dimensional settings, and demonstrate in simulations that the estimation performances of the algorithms are very similar. All software is implemented in the R-package pcalg.

Article information

Source
Ann. Statist., Volume 40, Number 1 (2012), 294-321.

Dates
First available in Project Euclid: 4 April 2012

Permanent link to this document
https://projecteuclid.org/euclid.aos/1333567191

Digital Object Identifier
doi:10.1214/11-AOS940

Mathematical Reviews number (MathSciNet)
MR3014308

Zentralblatt MATH identifier
1246.62131

Subjects
Primary: 62H12: Estimation 62M45: Neural nets and related approaches 62-04: Explicit machine computation and programs (not the theory of computation or programming)
Secondary: 68T30: Knowledge representation

Keywords
Causal structure learning FCI algorithm RFCI algorithm maximal ancestral graphs (MAGs) partial ancestral graphs (PAGs) high-dimensionality sparsity consistency

Citation

Colombo, Diego; Maathuis, Marloes H.; Kalisch, Markus; Richardson, Thomas S. Learning high-dimensional directed acyclic graphs with latent and selection variables. Ann. Statist. 40 (2012), no. 1, 294--321. doi:10.1214/11-AOS940. https://projecteuclid.org/euclid.aos/1333567191


Export citation

References

  • [1] Aho, A., Hopcroft, J. and Ullman, J. D. (1974). The Design and Analysis of Computer Algorithms. Addison-Wesley, Boston, MA.
  • [2] Ali, R. A., Richardson, T. S. and Spirtes, P. (2009). Markov equivalence for ancestral graphs. Ann. Statist. 37 2808–2837.
  • [3] Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505–541.
  • [4] Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2 445–498.
  • [5] Colombo, D., Maathuis, M. H., Kalisch, M. and Richardson, T. S. (2012). Supplement to “Learning high-dimensional directed acyclic graphs with latent and selection variables.” DOI:10.1214/11-AOS940SUPP.
  • [6] Cooper, G. (1995). Causal discovery from data in the presence of selection bias. In Preliminary Papers of the Fifth International Workshop on Artificial Intelligence and Statistics (D. Fisher, ed.) 140–150.
  • [7] Dawid, A. P. (1980). Conditional independence for statistical operations. Ann. Statist. 8 598–617.
  • [8] Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613–636.
  • [9] Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. H. and Bühlmann, P. (2012). Causal inference using graphical models with the R package pcalg. J. Statist. Software. To appear.
  • [10] Maathuis, M. H., Colombo, D., Kalisch, M. and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nat. Methods 7 247–248.
  • [11] Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 3133–3164.
  • [12] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • [13] Pearl, J. (2000). Causality. Models, Reasoning, and Inference. Cambridge Univ. Press, Cambridge.
  • [14] Pearl, J. (2009). Causal inference in statistics: An overview. Stat. Surv. 3 96–146.
  • [15] Ramsey, J., Zhang, J. and Spirtes, P. (2006). Adjacency-faithfulness and conservative causal inference. In Proceedings of the 22nd Annual Conference on Uncertainty in Artificial Intelligence. AUAI Press, Arlington, VA.
  • [16] Richardson, T. and Spirtes, P. (2002). Ancestral graph Markov models. Ann. Statist. 30 962–1030.
  • [17] Richardson, T. S. and Spirtes, P. (2003). Causal inference via ancestral graph models. In Highly Structured Stochastic Systems. Oxford Statistical Science Series 27 83–113. Oxford Univ. Press, Oxford.
  • [18] Robins, J. M., Hernán, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550–560.
  • [19] Spirtes, P. (2001). An anytime algorithm for causal inference. In Proc. of the Eighth International Workshop on Artificial Intelligence and Statistics 213–221. Morgan Kaufmann, San Francisco.
  • [20] Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press, Cambridge, MA.
  • [21] Spirtes, P., Meek, C. and Richardson, T. (1999). An algorithm for causal inference in the presence of latent variables and selection bias. In Computation, Causation, and Discovery 211–252. AAAI Press, Menlo Park, CA.
  • [22] Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models. In Proceedings of the Sixth Annual Conference on Uncertainty in Artificial Intelligence 255–270. Elsevier, New York.
  • [23] Zhang, J. (2008). Causal reasoning with ancestral graphs. J. Mach. Learn. Res. 9 1437–1474.
  • [24] Zhang, J. (2008). On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artificial Intelligence 172 1873–1896.
  • [25] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.

Supplemental materials

  • Supplementary material: Supplement to “Learning high-dimensional directed acyclic graphs with latent and selection variables”. All proofs, a description of the Adaptive Anytime FCI algorithm, pseudocodes, and two additional examples can be found in the supplementary document [5].