The Annals of Statistics

Covariance regularization by thresholding

Peter J. Bickel and Elizaveta Levina

Full-text: Open access

Abstract

This paper considers regularizing a covariance matrix of p variables estimated from n observations, by hard thresholding. We show that the thresholded estimate is consistent in the operator norm as long as the true covariance matrix is sparse in a suitable sense, the variables are Gaussian or sub-Gaussian, and (log p)/n→0, and obtain explicit rates. The results are uniform over families of covariance matrices which satisfy a fairly natural notion of sparsity. We discuss an intuitive resampling scheme for threshold selection and prove a general cross-validation result that justifies this approach. We also compare thresholding to other covariance estimators in simulations and on an example from climate data.

Article information

Source
Ann. Statist. Volume 36, Number 6 (2008), 2577-2604.

Dates
First available: 5 January 2009

Permanent link to this document
http://projecteuclid.org/euclid.aos/1231165180

Digital Object Identifier
doi:10.1214/08-AOS600

Mathematical Reviews number (MathSciNet)
MR2387969

Zentralblatt MATH identifier
05503371

Subjects
Primary: 62H12: Estimation
Secondary: 62F12: Asymptotic properties of estimators 62G09: Resampling methods

Keywords
Covariance estimation regularization sparsity thresholding large p small n high dimension low sample size

Citation

Bickel, Peter J.; Levina, Elizaveta. Covariance regularization by thresholding. The Annals of Statistics 36 (2008), no. 6, 2577--2604. doi:10.1214/08-AOS600. http://projecteuclid.org/euclid.aos/1231165180.


Export citation

References

  • [1] Abramovich, F., Benjamini, Y., Donoho, D. and Johnstone, I. (2006). Adapting to unknown sparsity by controlling the false discovery rate. Ann. Statist. 34 584–653.
  • [2] Bickel, P. J. and Levina, E. (2004). Some theory for Fisher’s linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations. Bernoulli 10 989–1010.
  • [3] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • [4] Bickel, P. J., Ritov, Y. and Zakai, A. (2006). Some theory for generalized boosting algorithms. J. Mach. Learn. Res. 7 705–732.
  • [5] d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2007). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. 30 56–66.
  • [6] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448.
  • [7] Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss. Ann. Statist. 13 1581–1591.
  • [8] Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
  • [9] Dudoit, S. and van der Laan, M. J. (2005). Asymptotics of cross-validated risk estimation in estimator selection and performance assessment. Statist. Methodol. 2 131–154.
  • [10] El Karoui, N. (2007a). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. To appear.
  • [11] El Karoui, N. (2007b). Spectrum estimation for large-dimensional covariance matrices using random matrix theory. Ann. Statist. To appear.
  • [12] El Karoui, N. (2007c). Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663–714.
  • [13] Fan, J., Fan, Y. and Lv, J. (2008). High-dimensional covariance matrix estimation using a factor model. J. Econometrics. To appear.
  • [14] Fan, J., Feng, Y. and Wu, Y. (2007). Network exploration via the adaptive LASSO and SCAD penalties. Unpublished manuscript.
  • [15] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [16] Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • [17] Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posterior covariance matrices in Kalman filter variants. J. Multivariate Anal. 98 227–255.
  • [18] Golub, G. H. and Van Loan, C. F. (1989). Matrix Computations, 2nd ed. Johns Hopkins Univ. Press, Baltimore, MD.
  • [19] Gyorfi, L., Kohler, M., Krzyzak, A. and Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression. Springer, New York.
  • [20] Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statist. 8 586–597.
  • [21] Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • [22] Johnstone, I. and Silverman, B. (2005). Empirical Bayes selection of wavelet thresholds. Ann. Statist. 33 1700–1752.
  • [23] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [24] Johnstone, I. M. and Lu, A. Y. (2004). Sparse principal components analysis. Unpublished manuscript.
  • [25] Lam, C. and Fan, J. (2007). Sparsistency and rates of convergence in large covariance matrices estimation. Manuscript.
  • [26] Ledoit, O. and Wolf, M. (2003). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
  • [27] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Amer. Math. Soc., Providence, RI.
  • [28] Levina, E., Rothman, A. J. and Zhu, J. (2008). Sparse estimation of large covariance matrices via a nested Lasso penalty. Ann. Appl. Statist. 2 245–263.
  • [29] Marčenko, V. A. and Pastur, L. A. (1967). Distributions of eigenvalues of some sets of random matrices. Math. USSR-Sb 1 507–536.
  • [30] Paul, D. (2007). Asymptotics of the leading sample eigenvalues for a spiked covariance model. Statist. Sinica 17 1617–1642.
  • [31] Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
  • [32] Saulis, L. and Statulevičius, V. A. (1991). Limit Theorems for Large Deviations. Kluwer, Dordrecht.
  • [33] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
  • [34] Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90 831–844.
  • [35] Yuan, M. and Lin, Y. (2007). Model selection and estimation in the Gaussian graphical model. Biometrika 94 19–35.
  • [36] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal components analysis. J. Comput. Graph. Statist. 15 265–286.