Electronic Journal of Statistics

High-dimensional covariance estimation by minimizing 1-penalized log-determinant divergence

Pradeep Ravikumar, Martin J. Wainwright, Garvesh Raskutti, and Bin Yu

Full-text: Open access

Abstract

Given i.i.d. observations of a random vector Xp, we study the problem of estimating both its covariance matrix Σ*, and its inverse covariance or concentration matrix Θ*=(Σ*)1. When X is multivariate Gaussian, the non-zero structure of Θ* is specified by the graph of an associated Gaussian Markov random field; and a popular estimator for such sparse Θ* is the 1-regularized Gaussian MLE. This estimator is sensible even for for non-Gaussian X, since it corresponds to minimizing an 1-penalized log-determinant Bregman divergence. We analyze its performance under high-dimensional scaling, in which the number of nodes in the graph p, the number of edges s, and the maximum node degree d, are allowed to grow as a function of the sample size n. In addition to the parameters (p,s,d), our analysis identifies other key quantities that control rates: (a) the -operator norm of the true covariance matrix Σ*; and (b) the operator norm of the sub-matrix Γ*SS, where S indexes the graph edges, and Γ*=(Θ*)1(Θ*)1; and (c) a mutual incoherence or irrepresentability measure on the matrix Γ* and (d) the rate of decay 1/f(n,δ) on the probabilities {|Σ̂nijΣ*ij|>δ}, where Σ̂n is the sample covariance based on n samples. Our first result establishes consistency of our estimate Θ̂ in the elementwise maximum-norm. This in turn allows us to derive convergence rates in Frobenius and spectral norms, with improvements upon existing results for graphs with maximum node degrees $\ensuremath{d}=o(\sqrt{\ensuremath{s}})$. In our second result, we show that with probability converging to one, the estimate Θ̂ correctly specifies the zero pattern of the concentration matrix Θ*. We illustrate our theoretical results via simulations for various graphs and problem parameters, showing good correspondences between the theoretical predictions and behavior in simulations.

Article information

Source
Electron. J. Statist. Volume 5 (2011), 935-980.

Dates
First available in Project Euclid: 15 September 2011

Permanent link to this document
http://projecteuclid.org/euclid.ejs/1316092865

Digital Object Identifier
doi:10.1214/11-EJS631

Mathematical Reviews number (MathSciNet)
MR2836766

Subjects
Primary: 62F12: Asymptotic properties of estimators
Secondary: 62F30: Inference under constraints

Keywords
Covariance concentration precision sparsity Gaussian graphical models ℓ_1 regularization

Citation

Ravikumar, Pradeep; Wainwright, Martin J.; Raskutti, Garvesh; Yu, Bin. High-dimensional covariance estimation by minimizing ℓ 1 -penalized log-determinant divergence. Electron. J. Statist. 5 (2011), 935--980. doi:10.1214/11-EJS631. http://projecteuclid.org/euclid.ejs/1316092865.


Export citation

References

  • [1] P. J. Bickel and E. Levina. Covariance regularization by thresholding., Ann. Statist., 36(6) :2577–2604, 2008.
  • [2] P. J. Bickel and E. Levina. Regularized estimation of large covariance matrices., Ann. Statist., 36(1):199–227, 2008.
  • [3] S. Boyd and L. Vandenberghe., Convex optimization. Cambridge University Press, Cambridge, UK, 2004.
  • [4] L. M. Bregman. The relaxation method for finding the common point of convex sets and its application to the solution of problems in convex programming., USSR Computational Mathematics and Mathematical Physics, 7:191–204, 1967.
  • [5] L. D. Brown., Fundamentals of statistical exponential families. Institute of Mathematical Statistics, Hayward, CA, 1986.
  • [6] V. V. Buldygin and Y. V. Kozachenko., Metric characterization of random variables and random processes. American Mathematical Society, Providence, RI, 2000.
  • [7] T. T. Cai, C.-H. Zhang, and H. H. Zhou. Optimal rates of convergence for covariance matrix estimation., Annals of Statistics, 2010. To appear.
  • [8] Y. Censor and S. A. Zenios., Parallel Optimization: Theory, Algorithms, and Applications. Numerical Mathematics and Scientific Computation. Oxford University Press, 1988.
  • [9] A. d’Asprémont, O. Banerjee, and L. El Ghaoui. First-order methods for sparse covariance selection., SIAM J. Matrix Anal. Appl., 30(1):56–66, 2008.
  • [10] N. El Karoui. Operator norm consistent estimation of large dimensional sparse covariance matrices., Ann. Statist., To appear, 2008.
  • [11] J. Friedman, T. Hastie, and R. Tibshirani. Sparse inverse covariance estimation with the graphical Lasso., Biostat., 9(3):432–441, 2007.
  • [12] R. Furrer and T. Bengtsson. Estimation of high-dimensional prior and posterior covariance matrices in kalman filter variants., J. Multivar. Anal., 98(2):227–255, 2007.
  • [13] C. Giraud. Estimation of gaussian graph by model selection., Electronic Journal of Statistics, 2:542–563, 2008.
  • [14] R. A. Horn and C. R. Johnson., Matrix Analysis. Cambridge University Press, Cambridge, 1985.
  • [15] J. Z. Huang, N. Liu, M. Pourahmadi, and L. Liu. Covariance matrix selection and estimation via penalised normal likelihood., Biometrika, 93(1):85–98, 2006.
  • [16] I. M. Johnstone. On the distribution of the largest eigenvalue in principal components analysis., Ann. Statist., 29(2):295–327, 2001.
  • [17] I. M. Johnstone and A. Y. Lu. Sparse principal components analysis., Unpublished Manuscript, 2004.
  • [18] C. Lam and J. Fan. Sparsistency and rates of convergence in large covariance matrix estimation., Annals of Statistics, 37 :4254–4278, 2009.
  • [19] O. Ledoit and M. Wolf. A well-conditioned estimator for large-dimensional covariance matrices., J. Multivar. Anal., 88:365–411, 2003.
  • [20] M. Ledoux., The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs. American Mathematical Society, Providence, RI, 2001.
  • [21] N. Meinshausen. A note on the Lasso for graphical Gaussian model selection., Statistics and Probability Letters, 78(7): 880–884, 2008.
  • [22] N. Meinshausen and P. Bühlmann. High-dimensional graphs and variable selection with the Lasso., Ann. Statist., 34(3) :1436–1462, 2006.
  • [23] J. M. Ortega and W. G. Rheinboldt., Iterative Solution of Nonlinear Equations in Several Variables. Academic Press, NY, 1970.
  • [24] V. V. Petrov., Limit Theorems Of Probability Theory: Sequences Of Independent Random Variables. Oxford University Press, Oxford, UK, 1995.
  • [25] P. Ravikumar, M. J. Wainwright, G. Raskutti, and B. Yu. High-dimensional covariance estimation: Convergence rates of, 1-regularized log-determinant divergence. Technical report, Department of Statistics, UC Berkeley, September 2008.
  • [26] H. P. Rosenthal. On the subspaces of, lp (p>2) spanned by sequences of independent random variables. Israel J. Math, 8 :1546–1570, 1970.
  • [27] A. J. Rothman, P. J. Bickel, E. Levina, and J. Zhu. Sparse permutation invariant covariance estimation., Electron. J. Statist., 2:494–515, 2008.
  • [28] J. A. Tropp. Just relax: Convex programming methods for identifying sparse signals., IEEE Trans. Info. Theory, 51(3): 1030–1051, 2006.
  • [29] M. J. Wainwright. Sharp thresholds for high-dimensional and noisy sparsity recovery using, 1-constrained quadratic programming (Lasso). IEEE Trans. Information Theory, 55 :2183–2202, May 2009.
  • [30] W. B. Wu and M. Pourahmadi. Nonparametric estimation of large covariance matrices of longitudinal data., Biometrika, 90(4):831–844, 2003.
  • [31] M. Yuan and Y. Lin. Model selection and estimation in the Gaussian graphical model., Biometrika, 94(1):19–35, 2007.
  • [32] P. Zhao and B. Yu. On model selection consistency of Lasso., Journal of Machine Learning Research, 7 :2541–2567, 2006.
  • [33] S. Zhou, J. Lafferty, and L. Wasserman. Time-varying undirected graphs. In, 21st Annual Conference on Learning Theory (COLT), Helsinki, Finland, July 2008.