The Annals of Statistics

High-dimensional consistency in score-based and hybrid structure learning

Preetam Nandy, Alain Hauser, and Marloes H. Maathuis

Full-text: Open access


Main approaches for learning Bayesian networks can be classified as constraint-based, score-based or hybrid methods. Although high-dimensional consistency results are available for constraint-based methods like the PC algorithm, such results have not been proved for score-based or hybrid methods, and most of the hybrid methods have not even shown to be consistent in the classical setting where the number of variables remains fixed and the sample size tends to infinity. In this paper, we show that consistency of hybrid methods based on greedy equivalence search (GES) can be achieved in the classical setting with adaptive restrictions on the search space that depend on the current state of the algorithm. Moreover, we prove consistency of GES and adaptively restricted GES (ARGES) in several sparse high-dimensional settings. ARGES scales well to sparse graphs with thousands of variables and our simulation study indicates that both GES and ARGES generally outperform the PC algorithm.

Article information

Ann. Statist., Volume 46, Number 6A (2018), 3151-3183.

Received: May 2016
Revised: June 2017
First available in Project Euclid: 7 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation 62-09: Graphical methods 62F12: Asymptotic properties of estimators

Bayesian network directed acyclic graph (DAG) linear structural equation model (linear SEM) structure learning greedy equivalence search (GES) score-based method hybrid method high-dimensional data consistency


Nandy, Preetam; Hauser, Alain; Maathuis, Marloes H. High-dimensional consistency in score-based and hybrid structure learning. Ann. Statist. 46 (2018), no. 6A, 3151--3183. doi:10.1214/17-AOS1654.

Export citation


  • Alonso-Barba, J. I., delaOssa, L., Gámez, J. A. and Puerta, J. M. (2013). Scaling up the greedy equivalence search algorithm by constraining the search space of equivalence classes. Internat. J. Approx. Reason. 54 429–451.
  • Anandkumar, A., Tan, V. Y. F., Huang, F. and Willsky, A. S. (2012). High-dimensional Gaussian graphical model selection: Walk summability and local separation criterion. J. Mach. Learn. Res. 13 2293–2337.
  • Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505–541.
  • Banerjee, O., El Ghaoui, L. and d’Aspremont, A. (2008). Model selection through sparse maximum likelihood estimation for multivariate Gaussian or binary data. J. Mach. Learn. Res. 9 485–516.
  • Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
  • Chen, J. and Chen, Z. (2008). Extended Bayesian information criteria for model selection with large model spaces. Biometrika 95 759–771.
  • Chickering, D. M. (2002). Learning equivalence classes of Bayesian-network structures. J. Mach. Learn. Res. 2 445–498.
  • Chickering, D. M. (2003). Optimal structure identification with greedy search. J. Mach. Learn. Res. 3 507–554.
  • Chickering, D. M. and Meek, C. (2002). Finding optimal Bayesian networks. In UAI 2002.
  • Chickering, D. M. and Meek, C. (2015). Selective greedy equivalence search: Finding optimal Bayesian networks using a polynomial number of score evaluations. In UAI 2015.
  • Chow, C. and Liu, C. (1968). Approximating discrete probability distributions with dependence trees. IEEE Trans. Inform. Theory 14 462–467.
  • Colombo, D. and Maathuis, M. H. (2014). Order-independent constraint-based causal structure learning. J. Mach. Learn. Res. 15 3741–3782.
  • de Campos, L. M. (1998). Independency relationships and learning algorithms for singly connected networks. J. Exp. Theor. Artif. Intell. 10 511–549.
  • Doran, G., Muandet, K., Zhang, K. and Schölkopf, B. (2014). A permutation-based kernel conditional independence test. In UAI 2014.
  • Fawcett, T. (2006). An introduction to ROC analysis. Pattern Recogn. Lett. 27 861–874.
  • Foygel, R. and Drton, M. (2010). Extended Bayesian information criteria for Gaussian graphical models. In NIPS 2010.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. J. Stat. Softw. 33 1–22.
  • Gao, B. and Cui, Y. (2015). Learning directed acyclic graphical structures with genetical genomics data. Bioinformatics 31 3953–3960.
  • Ha, M. J., Sun, W. and Xie, J. (2016). PenPC: A two-step approach to estimate the skeletons of high-dimensional directed acyclic graphs. Biometrics 72 146–155.
  • Harris, N. and Drton, M. (2013). PC algorithm for nonparanormal graphical models. J. Mach. Learn. Res. 14 3365–3383.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
  • Hauser, A. and Bühlmann, P. (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J. Mach. Learn. Res. 13 2409–2464.
  • Huete, J. F. and de Campos, L. M. (1993). Learning causal polytrees. In ECSQARU 1993.
  • Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613–636.
  • Kalisch, M., Mächler, M., Colombo, D., Maathuis, M. and Bühlmann, P. (2012). Causal inference using graphical models with the R package pcalg. J. Stat. Softw. 47 1–26.
  • Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA.
  • Le, T. D., Liu, L., Tsykin, A., Goodall, G. J., Liu, B., Sun, B.-Y. and Li, J. (2013). Inferring microRNA-mRNA causal regulatory relationships from expression data. Bioinformatics 29 765–771.
  • Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40 2293–2326.
  • Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 3133–3164.
  • Maathuis, M. H., Colombo, D., Kalisch, M. and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nat. Methods 7 247–248.
  • Meek, C. (1995). Causal inference and causal explanation with background knowledge. In UAI 1995.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 417–473.
  • Nandy, P., Hauser, A. and Maathuis, M. H. (2018). Supplement to “High-dimensional consistency in score-based and hybrid structure learning.” DOI:10.1214/17-AOS1654SUPP.
  • Nandy, P., Maathuis, M. H. and Richardson, T. S. (2017). Estimating the effect of joint interventions from observational data in sparse high-dimensional settings. Ann. Statist. 45 647–674.
  • Ouerd, M., Oommen, B. J. and Matwin, S. (2004). A formal approach to using data distributions for building causal polytree structures. Inform. Sci. 168 111–132.
  • Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge.
  • Perkovic, E., Textor, J., Kalisch, M. and Maathuis, M. (2015a). A complete adjustment criterion. In UAI 2015.
  • Perkovic, E., Textor, J., Kalisch, M. and Maathuis, M. (2015b). Complete graphical characterization and construction of adjustment sets in markov equivalence classes of ancestral graphs. Preprint. Available at arXiv:1606.06903.
  • Ramsey, J. D. (2015). Scaling up greedy equivalence search for continuous variables. Preprint. Available at arXiv:1507.07749.
  • Ravikumar, P. K., Raskutti, G., Wainwright, M. J. and Yu, B. (2008). Model selection in Gaussian graphical models: High-dimensional consistency of $\ell_{1}$-regularized MLE. In NIPS 2008.
  • Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_{1}$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
  • Rebane, G. and Pearl, J. (1987). The recovery of causal poly-trees from statistical data. In UAI 1987.
  • Schmidberger, M., Lennert, S. and Mansmann, U. (2011). Conceptual aspects of large meta-analyses with publicly available microarray data: A case study in oncology. Bioinform. Biol. Insights 5 13–39.
  • Schmidt, M., Niculescu-Mizil, A. and Murphy, K. (2007). Learning graphical model structure using L1-regularization paths. In AAAI 2007.
  • Schulte, O., Frigo, G., Greiner, R. and Khosravi, H. (2010). The IMAP hybrid method for learning Gaussian Bayes nets. In Canadian AI 2010.
  • Scutari, M. (2010). Learning Bayesian networks with the bnlearn R package. J. Stat. Softw. 35 1–22.
  • Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press, Cambridge, MA.
  • Spirtes, P., Richardson, T., Meek, C., Scheines, R. and Glymour, C. (1998). Using path diagrams as a structural equation modeling tool. Sociol. Methods Res. 27 182–225.
  • Stekhoven, D. J., Moraes, I., Sveinbjörnsson, G., Henning, L., Maathuis, M. H. and Bühlmann, P. (2012). Causal stability ranking. Bioinformatics 28 2819–2823.
  • Tsamardinos, I., Brown, L. E. and Aliferis, C. F. (2006). The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 65 31–78.
  • Uhler, C., Raskutti, G., Bühlmann, P. and Yu, B. (2013). Geometry of the faithfulness assumption in causal inference. Ann. Statist. 41 436–463.
  • van de Geer, S. and Bühlmann, P. (2013). $\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41 536–567.
  • Verdugo, R. A., Zeller, T., Rotival, M., Wild, P. S., Münzel, T., Lackner, K. J., Weidmann, H., Ninio, E., Trégouët, D.-A., Cambien, F., Blankenberg, S. and Tiret, L. (2013). Graphical modeling of gene expression in monocytes suggests molecular mechanisms explaining increased atherosclerosis in smokers. PLoS ONE 8 e50888.
  • Verma, T. and Pearl, J. (1990). Equivalence and synthesis of causal models. In UAI 1990.
  • Zhang, K., Peters, J., Janzing, D. and Schölkopf, B. (2011). Kernel-based conditional independence test and application in causal discovery. In UAI 2011.
  • Zhao, T., Liu, H., Roeder, K., Lafferty, J. and Wasserman, L. (2012). The huge package for high-dimensional undirected graph estimation in R. J. Mach. Learn. Res. 13 1059–1062.
  • Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.

Supplemental materials

  • Supplement to “High-dimensional consistency in score-based and hybrid structure learning”. All proofs, additional simulation results, additional details for Example 1 can be found in the supplementary material.