The Annals of Statistics

Sparsistency and rates of convergence in large covariance matrix estimation

Clifford Lam and Jianqing Fan

Full-text: Open access

Abstract

This paper studies the sparsistency and rates of convergence for estimating sparse covariance and precision matrices based on penalized likelihood with nonconvex penalty functions. Here, sparsistency refers to the property that all parameters that are zero are actually estimated as zero with probability tending to one. Depending on the case of applications, sparsity priori may occur on the covariance matrix, its inverse or its Cholesky decomposition. We study these three sparsity exploration problems under a unified framework with a general penalty function. We show that the rates of convergence for these problems under the Frobenius norm are of order (sn log pn/n)1/2, where sn is the number of nonzero elements, pn is the size of the covariance matrix and n is the sample size. This explicitly spells out the contribution of high-dimensionality is merely of a logarithmic factor. The conditions on the rate with which the tuning parameter λn goes to 0 have been made explicit and compared under different penalties. As a result, for the L1-penalty, to guarantee the sparsistency and optimal rate of convergence, the number of nonzero elements should be small: sn'=O(pn) at most, among O(pn2) parameters, for estimating sparse covariance or correlation matrix, sparse precision or inverse correlation matrix or sparse Cholesky factor, where sn' is the number of the nonzero elements on the off-diagonal entries. On the other hand, using the SCAD or hard-thresholding penalty functions, there is no such a restriction.

Article information

Source
Ann. Statist., Volume 37, Number 6B (2009), 4254-4278.

Dates
First available in Project Euclid: 23 October 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1256303543

Digital Object Identifier
doi:10.1214/09-AOS720

Mathematical Reviews number (MathSciNet)
MR2572459

Zentralblatt MATH identifier
1191.62101

Subjects
Primary: 62F12: Asymptotic properties of estimators
Secondary: 62J07: Ridge regression; shrinkage estimators

Keywords
Covariance matrix high-dimensionality consistency nonconcave penalized likelihood sparsistency asymptotic normality

Citation

Lam, Clifford; Fan, Jianqing. Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 (2009), no. 6B, 4254--4278. doi:10.1214/09-AOS720. https://projecteuclid.org/euclid.aos/1256303543


Export citation

References

  • Bai, Z. and Silverstein, J. W. (2006). Spectral Analysis of Large Dimensional Random Matrices. Science Press, Beijing.
  • Bickel, P. J. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Bickel, P. J. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Cai, T., Zhang, C.-H. and Zhou, H. (2008). Optimal rates of convergence for covariance matrix estimation. Technical report, The Wharton School, Univ. Pennsylvania.
  • d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2008). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56–66.
  • Dempster, A. P. (1972). Covariance selection. Biometrics 28 157–175.
  • Diggle, P. and Verbyla, A. (1998). Nonparametric estimation of covariance structure in longitudinal data. Biometrics 54 401–415.
  • El Karoui, N. (2008). Operator norm consistent estimation of a large dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • Fan, J., Feng, Y. and Wu, Y. (2009). Network exploration via the adaptive LASSO and SCAD penalties. Ann. Appl. Stat. 3 521–541.
  • Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • Fan, J. and Peng, H. (2004). Nonconcave penalized likelihood with a diverging number of parameters. Ann. Statist. 32 928–961.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical LASSO. Biostatistics 9 432–441.
  • Huang, J., Horowitz, J. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587–613.
  • Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • Levina, E., Rothman, A. J. and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. Ann. Appl. Stat. 2 245–263.
  • Meier, L., van de Geer, S. and Bühlmann, P. (2008). The group Lasso for logistic regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 53–71.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • Pourahmadi, M. (1999). Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation. Biometrika 86 677–690.
  • Ravikumar, P., Lafferty, J., Liu, H. and Wasserman, L. (2007). Sparse additive models. In Advances in Neural Information Processing Systems 20. MIT Press, Cambridge, MA.
  • Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • Smith, M. and Kohn, R. (2002). Parsimonious covariance matrix estimation for longitudinal data. J. Amer. Statist. Assoc. 97 1141–1153.
  • Wagaman, A. S. and Levina, E. (2008). Discovering sparse covariance structures with the Isomap. J. Comput. Graph. Statist. 18. To appear.
  • Wong, F., Carter, C. and Kohn, R. (2003). Efficient estimation of covariance selection models. Biometrika 90 809–830.
  • Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 94 1–17.
  • Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 90 831–844.
  • Zhang, C. H. (2007). Penalized linear unbiased selection. Technical report 2007-003, The Statistics Dept., Rutgers Univ.
  • Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
  • Zou, H. (2006). The adaptive Lasso and its oracle properties. J. Amer. Statist. Assoc. 101 1418–1429.
  • Zou, H. and Li, R. (2008). One-step sparse estimates in nonconcave penalized likelihood models (with discussion). Ann. Statist. 36 1509–1533.