## The Annals of Statistics

### $\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs

#### Abstract

We consider the problem of regularized maximum likelihood estimation for the structure and parameters of a high-dimensional, sparse directed acyclic graphical (DAG) model with Gaussian distribution, or equivalently, of a Gaussian structural equation model. We show that the $\ell_{0}$-penalized maximum likelihood estimator of a DAG has about the same number of edges as the minimal-edge I-MAP (a DAG with minimal number of edges representing the distribution), and that it converges in Frobenius norm. We allow the number of nodes $p$ to be much larger than sample size $n$ but assume a sparsity condition and that any representation of the true DAG has at least a fixed proportion of its nonzero edge weights above the noise level. Our results do not rely on the faithfulness assumption nor on the restrictive strong faithfulness condition which are required for methods based on conditional independence testing such as the PC-algorithm.

#### Article information

Source
Ann. Statist., Volume 41, Number 2 (2013), 536-567.

Dates
First available in Project Euclid: 26 April 2013

https://projecteuclid.org/euclid.aos/1366980557

Digital Object Identifier
doi:10.1214/13-AOS1085

Mathematical Reviews number (MathSciNet)
MR3099113

Zentralblatt MATH identifier
1267.62037

Subjects
Primary: 62F12: Asymptotic properties of estimators
Secondary: 62F30: Inference under constraints

#### Citation

van de Geer, Sara; Bühlmann, Peter. $\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41 (2013), no. 2, 536--567. doi:10.1214/13-AOS1085. https://projecteuclid.org/euclid.aos/1366980557

#### References

• Andersson, S. A., Madigan, D. and Perlman, M. D. (1997). A characterization of Markov equivalence classes for acyclic digraphs. Ann. Statist. 25 505–541.
• Bennett, G. (1962). Probability inequalities for sums of independent random variables. J. Amer. Statist. Assoc. 57 33–45.
• Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• Chickering, D. M. (2002). Optimal structure identification with greedy search. J. Mach. Learn. Res. 3 507–554.
• Hauser, A. and Bühlmann, P. (2012). Characterization and greedy learning of interventional Markov equivalence classes of directed acyclic graphs. J. Mach. Learn. Res. 13 2409–2464.
• Kalisch, M. and Bühlmann, P. (2007). Estimating high-dimensional directed acyclic graphs with the PC-algorithm. J. Mach. Learn. Res. 8 613–636.
• Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Clarendon, Oxford.
• Loh, P. L. and Wainwright, M. J. (2012). High-dimensional regression with noisy and missing data: Provable guarantees with non-convexity. Ann. Statist. 40 1637–1664.
• Maathuis, M. H., Kalisch, M. and Bühlmann, P. (2009). Estimating high-dimensional intervention effects from observational data. Ann. Statist. 37 3133–3164.
• Maathuis, M. H., Colombo, D., Kalisch, M. and Bühlmann, P. (2010). Predicting causal effects in large-scale systems from observational data. Nature Methods 7 247–248.
• Massart, P. (2003). Concentration inequalities and model selection. In Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 623, 2003. Springer, Berlin.
• Pearl, J. (2000). Causality: Models, Reasoning, and Inference. Cambridge Univ. Press, Cambridge.
• Peters, J. and Bühlmann, P. (2012). Identifiability of Gaussian structural equation models with same error variances. Available at arXiv:1205.2536.
• Raskutti, G., Wainwright, M. J. and Yu, B. (2010). Restricted eigenvalue properties for correlated Gaussian designs. J. Mach. Learn. Res. 11 2241–2259.
• Robins, J. M., Scheines, R., Spirtes, P. and Wasserman, L. (2003). Uniform consistency in causal inference. Biometrika 90 491–515.
• Shojaie, A. and Michailidis, G. (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97 519–538.
• Silander, T. and Myllymäki, P. (2006). A simple approach for finding the globally optimal Bayesian network structure. In Proceedings of the Twenty-Second Conference on Uncertainty in Artificial Intelligence 445–452. AUAI Press, Arlington, VA.
• Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press, Cambridge, MA.
• Uhler, C., Raskutti, G., Bühlmann, P. and Yu, B. (2013). Geometry of the faithfulness assumption in causal inference. Ann. Statist. 41 436–463.
• van de Geer, S., Bühlmann, P. and Zhou, S. (2011). The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso). Electron. J. Stat. 5 688–749.
• Zhang, J. and Spirtes, P. (2003). Strong faithfulness and uniform consistency in causal inference. In Proceedings of the Nineteenth Conference on Uncertainty in Artificial Intelligence. Morgan Kaufmann, San Francisco.
• Zou, H. (2006). The adaptive lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.