## Electronic Journal of Statistics

### High-dimensional covariance estimation by minimizing ℓ1-penalized log-determinant divergence

#### Abstract

Given i.i.d. observations of a random vector Xp, we study the problem of estimating both its covariance matrix Σ*, and its inverse covariance or concentration matrix Θ*=(Σ*)1. When X is multivariate Gaussian, the non-zero structure of Θ* is specified by the graph of an associated Gaussian Markov random field; and a popular estimator for such sparse Θ* is the 1-regularized Gaussian MLE. This estimator is sensible even for for non-Gaussian X, since it corresponds to minimizing an 1-penalized log-determinant Bregman divergence. We analyze its performance under high-dimensional scaling, in which the number of nodes in the graph p, the number of edges s, and the maximum node degree d, are allowed to grow as a function of the sample size n. In addition to the parameters (p,s,d), our analysis identifies other key quantities that control rates: (a) the -operator norm of the true covariance matrix Σ*; and (b) the operator norm of the sub-matrix Γ*SS, where S indexes the graph edges, and Γ*=(Θ*)1(Θ*)1; and (c) a mutual incoherence or irrepresentability measure on the matrix Γ* and (d) the rate of decay 1/f(n,δ) on the probabilities {|Σ̂nijΣ*ij|>δ}, where Σ̂n is the sample covariance based on n samples. Our first result establishes consistency of our estimate Θ̂ in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees $d=o(\sqrt{s})$. In our second result, we show that with probability converging to one, the estimate Θ̂ correctly specifies the zero pattern of the concentration matrix Θ*. We illustrate our theoretical results via simulations for various graphs and problem parameters, showing good correspondences between the theoretical predictions and behavior in simulations.

#### Article information

Source
Electron. J. Statist., Volume 5 (2011), 935-980.

Dates
First available in Project Euclid: 15 September 2011

https://projecteuclid.org/euclid.ejs/1316092865

Digital Object Identifier
doi:10.1214/11-EJS631

Mathematical Reviews number (MathSciNet)
MR2836766

Zentralblatt MATH identifier
1274.62190

Subjects
Primary: 62F12: Asymptotic properties of estimators
Secondary: 62F30: Inference under constraints

#### Citation

Ravikumar, Pradeep; Wainwright, Martin J.; Raskutti, Garvesh; Yu, Bin. High-dimensional covariance estimation by minimizing ℓ 1 -penalized log-determinant divergence. Electron. J. Statist. 5 (2011), 935--980. doi:10.1214/11-EJS631. https://projecteuclid.org/euclid.ejs/1316092865

#### References

• [1] P. J. Bickel and E. Levina. Covariance regularization by thresholding., Ann. Statist., 36(6) :2577–2604, 2008.
• [2] P. J. Bickel and E. Levina. Regularized estimation of large covariance matrices., Ann. Statist., 36(1):199–227, 2008.
• [3] S. Boyd and L. Vandenberghe., Convex optimization. Cambridge University Press, Cambridge, UK, 2004.
• [4] L. M. Bregman. The relaxation method for finding the common point of convex sets and its application to the solution of problems in convex programming., USSR Computational Mathematics and Mathematical Physics, 7:191–204, 1967.
• [5] L. D. Brown., Fundamentals of statistical exponential families. Institute of Mathematical Statistics, Hayward, CA, 1986.
• [6] V. V. Buldygin and Y. V. Kozachenko., Metric characterization of random variables and random processes. American Mathematical Society, Providence, RI, 2000.
• [7] T. T. Cai, C.-H. Zhang, and H. H. Zhou. Optimal rates of convergence for covariance matrix estimation., Annals of Statistics, 2010. To appear.
• [8] Y. Censor and S. A. Zenios., Parallel Optimization: Theory, Algorithms, and Applications. Numerical Mathematics and Scientific Computation. Oxford University Press, 1988.
• [9] A. d’Asprémont, O. Banerjee, and L. El Ghaoui. First-order methods for sparse covariance selection., SIAM J. Matrix Anal. Appl., 30(1):56–66, 2008.
• [10] N. El Karoui. Operator norm consistent estimation of large dimensional sparse covariance matrices., Ann. Statist., To appear, 2008.
• [11] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical Lasso., Biostat., 9(3):432–441, 2007.
• [12] R. Furrer and T. Bengtsson. Estimation of high-dimensional prior and posterior covariance matrices in kalman filter variants., J. Multivar. Anal., 98(2):227–255, 2007.
• [13] C. Giraud. Estimation of gaussian graph by model selection., Electronic Journal of Statistics, 2:542–563, 2008.
• [14] R. A. Horn and C. R. Johnson., Matrix Analysis. Cambridge University Press, Cambridge, 1985.
• [15] J. Z. Huang, N. Liu, M. Pourahmadi, and L. Liu. Covariance matrix selection and estimation via penalised normal likelihood., Biometrika, 93(1):85–98, 2006.
• [16] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis., Ann. Statist., 29(2):295–327, 2001.
• [17] I. M. Johnstone and A. Y. Lu. Sparse principal components analysis., Unpublished Manuscript, 2004.
• [18] C. Lam and J. Fan. Sparsistency and rates of convergence in large covariance matrix estimation., Annals of Statistics, 37 :4254–4278, 2009.
• [19] O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariance matrices., J. Multivar. Anal., 88:365–411, 2003.
• [20] M. Ledoux., The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2001.
• [21] N. Meinshausen. A note on the Lasso for graphical Gaussian model selection., Statistics and Probability Letters, 78(7): 880–884, 2008.
• [22] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the Lasso., Ann. Statist., 34(3) :1436–1462, 2006.
• [23] J. M. Ortega and W. G. Rheinboldt., Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, NY, 1970.
• [24] V. V. Petrov., Limit Theorems Of Probability Theory: Sequences Of Independent Random Variables. Oxford University Press, Oxford, UK, 1995.
• [25] P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu. High-dimensional covariance estimation: Convergence rates of, 1-regularized log-determinant divergence. Technical report, Department of Statistics, UC Berkeley, September 2008.
• [26] H. P. Rosenthal. On the subspaces of, lp (p>2) spanned by sequences of independent random variables. Israel J. Math, 8 :1546–1570, 1970.
• [27] A. J. Rothman, P. J. Bickel, E. Levina, and J. Zhu. Sparse permutation invariant covariance estimation., Electron. J. Statist., 2:494–515, 2008.
• [28] J. A. Tropp. Just relax: Convex programming methods for identifying sparse signals., IEEE Trans. Info. Theory, 51(3): 1030–1051, 2006.
• [29] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using, 1-constrained quadratic programming (Lasso). IEEE Trans. Information Theory, 55 :2183–2202, May 2009.
• [30] W. B. Wu and M. Pourahmadi. Nonparametric estimation of large covariance matrices of longitudinal data., Biometrika, 90(4):831–844, 2003.
• [31] M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphical model., Biometrika, 94(1):19–35, 2007.
• [32] P. Zhao and B. Yu. On model selection consistency of Lasso., Journal of Machine Learning Research, 7 :2541–2567, 2006.
• [33] S. Zhou, J. Lafferty, and L. Wasserman. Time-varying undirected graphs. In, 21st Annual Conference on Learning Theory (COLT), Helsinki, Finland, July 2008.