## The Annals of Statistics

### Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models

#### Abstract

Covariance estimation and selection for high-dimensional multivariate datasets is a fundamental problem in modern statistics. Gaussian directed acyclic graph (DAG) models are a popular class of models used for this purpose. Gaussian DAG models introduce sparsity in the Cholesky factor of the inverse covariance matrix, and the sparsity pattern in turn corresponds to specific conditional independence assumptions on the underlying variables. A variety of priors have been developed in recent years for Bayesian inference in DAG models, yet crucial convergence and sparsity selection properties for these models have not been thoroughly investigated. Most of these priors are adaptations/generalizations of the Wishart distribution in the DAG context. In this paper, we consider a flexible and general class of these “DAG-Wishart” priors with multiple shape parameters. Under mild regularity assumptions, we establish strong graph selection consistency and establish posterior convergence rates for estimation when the number of variables $p$ is allowed to grow at an appropriate subexponential rate with the sample size $n$.

#### Article information

Source
Ann. Statist., Volume 47, Number 1 (2019), 319-348.

Dates
Revised: January 2018
First available in Project Euclid: 30 November 2018

https://projecteuclid.org/euclid.aos/1543568590

Digital Object Identifier
doi:10.1214/18-AOS1689

Mathematical Reviews number (MathSciNet)
MR3909935

Zentralblatt MATH identifier
07036203

Subjects
Primary: 62F15: Bayesian inference
Secondary: 62G20: Asymptotic properties

#### Citation

Cao, Xuan; Khare, Kshitij; Ghosh, Malay. Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models. Ann. Statist. 47 (2019), no. 1, 319--348. doi:10.1214/18-AOS1689. https://projecteuclid.org/euclid.aos/1543568590

#### References

• [1] Altomare, D., Consonni, G. and La Rocca, L. (2013). Objective Bayesian search of Gaussian directed acyclic graphical models for ordered variables with non-local priors. Biometrics 69 478–487.
• [2] Aragam, B., Amini, A. and Zhou, Q. (2015). Learning directed acyclic graphs with penalized neighbourhood regression. Available at https://arxiv.org/abs/1511.08963.
• [3] Banerjee, S. and Ghosal, S. (2014). Posterior convergence rates for estimating large precision matrices using graphical models. Electron. J. Stat. 8 2111–2137.
• [4] Banerjee, S. and Ghosal, S. (2015). Bayesian structure learning in graphical models. J. Multivariate Anal. 136 147–162.
• [5] Ben-David, E., Li, T., Massam, H. and Rajaratnam, B. (2016). High dimensional Bayesian inference for Gaussian directed acyclic graph models. Technical report. Available at http://arxiv.org/abs/1109.4371.
• [6] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [7] Cao, X., Khare, K. and Ghosh, M. (2019). Supplement to “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.” DOI:10.1214/18-AOS1689SUPP.
• [8] Consonni, G., La Rocca, L. and Peluso, S. (2017). Objective Bayes covariate-adjusted sparse graphical model selection. Scand. J. Stat. 44 741–764.
• [9] El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757–2790.
• [10] Geiger, D. and Heckerman, D. (2002). Parameter priors for directed acyclic graphical models and the characterization of several probability distributions. Ann. Statist. 30 1412–1440.
• [11] Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
• [12] Johnson, V. E. and Rossell, D. (2010). On the use of non-local prior densities in Bayesian hypothesis tests. J. R. Stat. Soc. Ser. B. Stat. Methodol. 72 143–170.
• [13] Johnson, V. E. and Rossell, D. (2012). Bayesian model selection in high-dimensional settings. J. Amer. Statist. Assoc. 107 649–660.
• [14] Khare, K., Oh, S., Rahman, S. and Rajaratnam, B. (2017). A convex framework for high-dimensional sparse cholesky based covariance estimation in gaussian dag models. Technical report. Available at https://arxiv.org/abs/1610.02436.
• [15] Koller, D. and Friedman, N. (2009). Probabilistic Graphical Models: Principles and Techniques. MIT Press, Cambridge, MA.
• [16] Letac, G. and Massam, H. (2007). Wishart distributions for decomposable graphs. Ann. Statist. 35 1278–1323.
• [17] Narisetty, N. N. and He, X. (2014). Bayesian variable selection with shrinking and diffusing priors. Ann. Statist. 42 789–817.
• [18] Paulsen, V. I., Power, S. C. and Smith, R. R. (1989). Schur products and matrix completions. J. Funct. Anal. 85 151–178.
• [19] Pourahmadi, M. (2007). Cholesky decompositions and estimation of a covariance matrix: Orthogonality of variance-correlation parameters. Biometrika 94 1006–1013.
• [20] Rothman, A. J., Levina, E. and Zhu, J. (2010). A new approach to Cholesky-based covariance regularization in high dimensions. Biometrika 97 539–550.
• [21] Rudelson, M. and Vershynin, R. (2013). Hanson–Wright inequality and sub-Gaussian concentration. Electron. Commun. Probab. 18 no. 82, 9.
• [22] Rütimann, P. and Bühlmann, P. (2009). High dimensional sparse covariance estimation via directed acyclic graphs. Electron. J. Stat. 3 1133–1160.
• [23] Shojaie, A. and Michailidis, G. (2010). Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs. Biometrika 97 519–538.
• [24] Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data. J. Amer. Statist. Assoc. 97 1141–1153.
• [25] van de Geer, S. and Bühlmann, P. (2013). $\ell_{0}$-penalized maximum likelihood for sparse directed acyclic graphs. Ann. Statist. 41 536–567.
• [26] Xiang, R., Khare, K. and Ghosh, M. (2015). High dimensional posterior convergence rates for decomposable graphical models. Electron. J. Stat. 9 2828–2854.
• [27] Yu, G. and Bien, J. (2017). Learning local dependence in ordered data. J. Mach. Learn. Res. 18 Paper No. 42, 60.

#### Supplemental materials

• Supplement to “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models”. This supplemental file contains additional proofs for theorems and technical lemmas.