The Annals of Statistics

Covariance regularization by thresholding

Peter J. Bickel and Elizaveta Levina

Full-text: Open access


This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and (log p)/n→0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general cross-validation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data.

Article information

Ann. Statist. Volume 36, Number 6 (2008), 2577-2604.

First available: 5 January 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation
Secondary: 62F12: Asymptotic properties of estimators 62G09: Resampling methods

Covariance estimation regularization sparsity thresholding large p small n high dimension low sample size


Bickel, Peter J.; Levina, Elizaveta. Covariance regularization by thresholding. The Annals of Statistics 36 (2008), no. 6, 2577--2604. doi:10.1214/08-AOS600.

Export citation


  • [1] Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • [2] Bickel, P. J. and Levina, E. (2004). Some theory for Fisher’s linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations. Bernoulli 10 989–1010.
  • [3] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [4] Bickel, P. J., Ritov, Y. and Zakai, A. (2006). Some theory for generalized boosting algorithms. J. Mach. Learn. Res. 7 705–732.
  • [5] d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2007). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56–66.
  • [6] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448.
  • [7] Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss. Ann. Statist. 13 1581–1591.
  • [8] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • [9] Dudoit, S. and van der Laan, M. J. (2005). Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Statist. Methodol. 2 131–154.
  • [10] El Karoui, N. (2007a). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. To appear.
  • [11] El Karoui, N. (2007b). Spectrum estimation for large-dimensional covariance matrices using random matrix theory. Ann. Statist. To appear.
  • [12] El Karoui, N. (2007c). Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663–714.
  • [13] Fan, J., Fan, Y. and Lv, J. (2008). High-dimensional covariance matrix estimation using a factor model. J. Econometrics. To appear.
  • [14] Fan, J., Feng, Y. and Wu, Y. (2007). Network exploration via the adaptive LASSO and SCAD penalties. Unpublished manuscript.
  • [15] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [16] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • [17] Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal. 98 227–255.
  • [18] Golub, G. H. and Van Loan, C. F. (1989). Matrix Computations, 2nd ed. Johns Hopkins Univ. Press, Baltimore, MD.
  • [19] Gyorfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
  • [20] Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statist. 8 586–597.
  • [21] Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • [22] Johnstone, I. and Silverman, B. (2005). Empirical Bayes selection of wavelet thresholds. Ann. Statist. 33 1700–1752.
  • [23] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [24] Johnstone, I. M. and Lu, A. Y. (2004). Sparse principal components analysis. Unpublished manuscript.
  • [25] Lam, C. and Fan, J. (2007). Sparsistency and rates of convergence in large covariance matrices estimation. Manuscript.
  • [26] Ledoit, O. and Wolf, M. (2003). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
  • [27] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Amer. Math. Soc., Providence, RI.
  • [28] Levina, E., Rothman, A. J. and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. Ann. Appl. Statist. 2 245–263.
  • [29] Marčenko, V. A. and Pastur, L. A. (1967). Distributions of eigenvalues of some sets of random matrices. Math. USSR-Sb 1 507–536.
  • [30] Paul, D. (2007). Asymptotics of the leading sample eigenvalues for a spiked covariance model. Statist. Sinica 17 1617–1642.
  • [31] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • [32] Saulis, L. and Statulevičius, V. A. (1991). Limit Theorems for Large Deviations. Kluwer, Dordrecht.
  • [33] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
  • [34] Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90 831–844.
  • [35] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • [36] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal components analysis. J. Comput. Graph. Statist. 15 265–286.