## The Annals of Statistics

### Latent variable graphical model selection via convex optimization

#### Abstract

Suppose we observe samples of a subset of a collection of random variables. No additional information is provided about the number of latent variables, nor of the relationship between the latent and observed variables. Is it possible to discover the number of latent components, and to learn a statistical model over the entire collection of variables? We address this question in the setting in which the latent and observed variables are jointly Gaussian, with the conditional statistics of the observed variables conditioned on the latent variables being specified by a graphical model. As a first step we give natural conditions under which such latent-variable Gaussian graphical models are identifiable given marginal statistics of only the observed variables. Essentially these conditions require that the conditional graphical model among the observed variables is sparse, while the effect of the latent variables is “spread out” over most of the observed variables. Next we propose a tractable convex program based on regularized maximum-likelihood for model selection in this latent-variable setting; the regularizer uses both the $\ell_{1}$ norm and the nuclear norm. Our modeling framework can be viewed as a combination of dimensionality reduction (to identify latent variables) and graphical modeling (to capture remaining statistical structure not attributable to the latent variables), and it consistently estimates both the number of latent components and the conditional graphical model structure among the observed variables. These results are applicable in the high-dimensional setting in which the number of latent/observed variables grows with the number of samples of the observed variables. The geometric properties of the algebraic varieties of sparse matrices and of low-rank matrices play an important role in our analysis.

#### Article information

Source
Ann. Statist. Volume 40, Number 4 (2012), 1935-1967.

Dates
First available: 30 October 2012

http://projecteuclid.org/euclid.aos/1351602527

Digital Object Identifier
doi:10.1214/11-AOS949

Mathematical Reviews number (MathSciNet)
MR3059067

Subjects
Primary: 62F12: Asymptotic properties of estimators 62H12: Estimation

#### Citation

Chandrasekaran, Venkat; Parrilo, Pablo A.; Willsky, Alan S. Latent variable graphical model selection via convex optimization. The Annals of Statistics 40 (2012), no. 4, 1935--1967. doi:10.1214/11-AOS949. http://projecteuclid.org/euclid.aos/1351602527.

#### References

• [1] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [2] Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• [3] Candès, E. J., Li, X., Ma, Y. and Wright, J. (2011). Robust principal component analysis? J. ACM 58 Art. 11, 37.
• [4] Candès, E. J. and Recht, B. (2009). Exact matrix completion via convex optimization. Found. Comput. Math. 9 717–772.
• [5] Candès, E. J., Romberg, J. and Tao, T. (2006). Robust uncertainty principles: Exact signal reconstruction from highly incomplete frequency information. IEEE Trans. Inform. Theory 52 489–509.
• [6] Chandrasekaran, V., Parrilo, P. A. and Willsky, A. S. (2011). Supplement to “Latent variable graphical model selection via convex optimization.” DOI:10.1214/11-AOS949SUPP.
• [7] Chandrasekaran, V., Sanghavi, S., Parrilo, P. A. and Willsky, A. S. (2011). Rank-sparsity incoherence for matrix decomposition. SIAM J. Optim. 21 572–596.
• [8] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices and Banach spaces. In Handbook of the Geometry of Banach Spaces, Vol. I 317–366. North-Holland, Amsterdam.
• [9] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
• [10] Donoho, D. L. (2006). For most large underdetermined systems of linear equations the minimal $l_1$-norm solution is also the sparsest solution. Comm. Pure Appl. Math. 59 797–829.
• [11] Donoho, D. L. (2006). Compressed sensing. IEEE Trans. Inform. Theory 52 1289–1306.
• [12] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
• [13] Elidan, G., Nachman, I. and Friedman, N. (2007). “Ideal parent” structure learning for continuous variable Bayesian networks. J. Mach. Learn. Res. 8 1799–1833.
• [14] Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186–197.
• [15] Fazel, M. (2002). Matrix rank minimization with applications. Ph.D. thesis, Dept. Elec. Eng., Stanford Univ.
• [16] Horn, R. A. and Johnson, C. R. (1990). Matrix Analysis. Cambridge Univ. Press, Cambridge.
• [17] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
• [18] Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
• [19] Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Oxford Univ. Press, New York.
• [20] Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
• [21] Löfberg, J. (2004). YALMIP: A toolbox for modeling and optimization in MATLAB. In Proceedings of the CACSD Conference, Taiwan. Available at http://control.ee.ethz.ch/~joloef/yalmip.php.
• [22] Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues in certain sets of random matrices. Mat. Sb. 72 507–536.
• [23] Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• [24] Meinshausen, N. and Bühlmann, P. (2010). Stability selection. J. R. Stat. Soc. Ser. B Stat. Methodol. 72 417–473.
• [25] Ortega, J. M. and Rheinboldt, W. C. (1970). Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, New York.
• [26] Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence. Electron. J. Stat. 4 935–980.
• [27] Recht, B., Fazel, M. and Parrilo, P. A. (2010). Guaranteed minimum-rank solutions of linear matrix equations via nuclear norm minimization. SIAM Rev. 52 471–501.
• [28] Rockafellar, R. T. (1996). Convex Analysis. Princeton Univ. Press, Princeton, NJ.
• [29] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
• [30] Speed, T. P. and Kiiveri, H. T. (1986). Gaussian Markov distributions over finite graphs. Ann. Statist. 14 138–150.
• [31] Toh, K. C., Todd, M. J. and Tutuncu, R. H. (1999). SDPT3—a MATLAB software package for semidefinite-quadratic-linear programming. Available at http://www.math.nus.edu.sg/~mattohkc/sdpt3.html.
• [32] Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_1$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• [33] Wang, C., Sun, D. and Toh, K.-C. (2010). Solving log-determinant optimization problems by a Newton-CG primal proximal point algorithm. SIAM J. Optim. 20 2994–3013.
• [34] Watson, G. A. (1992). Characterization of the subdifferential of some matrix norms. Linear Algebra Appl. 170 33–45.
• [35] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
• [36] Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90 831–844.
• [37] Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.