Annals of Statistics

Regularized estimation of large covariance matrices

Peter J. Bickel and Elizaveta Levina

Full-text: Open access


This paper considers estimating a covariance matrix of p variables from n observations by either banding or tapering the sample covariance matrix, or estimating a banded version of the inverse of the covariance. We show that these estimates are consistent in the operator norm as long as (log p)/n→0, and obtain explicit rates. The results are uniform over some fairly natural well-conditioned families of covariance matrices. We also introduce an analogue of the Gaussian white noise model and show that if the population covariance is embeddable in that model and well-conditioned, then the banded approximations produce consistent estimates of the eigenvalues and associated eigenvectors of the covariance matrix. The results can be extended to smooth versions of banding and to non-Gaussian distributions with sufficiently short tails. A resampling approach is proposed for choosing the banding parameter in practice. This approach is illustrated numerically on both simulated and real data.

Article information

Ann. Statist., Volume 36, Number 1 (2008), 199-227.

First available in Project Euclid: 1 February 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation
Secondary: 62F12: Asymptotic properties of estimators 62G09: Resampling methods

Covariance matrix regularization banding Cholesky decomposition


Bickel, Peter J.; Levina, Elizaveta. Regularized estimation of large covariance matrices. Ann. Statist. 36 (2008), no. 1, 199--227. doi:10.1214/009053607000000758.

Export citation


  • Anderson, T. W. (1958). An Introduction to Multivariate Statistical Analysis. Wiley, New York.
  • d’Aspremont, A., Banerjee, O. and El Ghaoui, L. (2007). First-order methods for sparse covariance selection. SIAM J. Matrix Anal. Appl. To appear.
  • Bai, Z. and Yin, Y. (1993). Limit of the smallest eigenvalue of a large dimensional sample covariance matrix. Ann. Probab. 21 1275–1294.
  • Bardet, J.-M., Lang, G., Oppenheim, G., Philippe, A. and Taqqu, M. (2002). Generators of long-range dependent processes. In Theory and Applications of Long-Range Dependence (P. Doukhan, G. Oppenheim and M. Taqqu, eds.) 579–623. Birkhäuser, Boston.
  • Bickel, P. J. and Levina, E. (2004). Some theory for Fisher’s linear discriminant function, “naive Bayes,” and some alternatives when there are many more variables than observations. Bernoulli 10 989–1010.
  • Bickel, P. J., Ritov, Y. and Zakai, A. (2006). Some theory for generalized boosting algorithms. J. Machine Learning Research 7 705–732.
  • Böttcher, A. (1996). Infinite matrices and projection methods. In Lectures on Operator Theory and Its Applications (P. Lancaster, ed.). Fields Institute Monographs 3 1–72. Amer. Math. Soc., Providence, RI.
  • De Vore, R. and Lorentz, G. (1993). Constructive Approximation. Springer, Berlin.
  • Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Pickard, D. (1995). Wavelet shrinkage: Asymptopia? (with discussion). J. Roy. Statist. Soc. Ser. B 57 301–369.
  • Fan, J., Fan, Y. and Lv, J. (2006). High dimensional covariance matrix estimation using a factor model. Technical report, Princeton Univ.
  • Fan, J. and Kreutzberger, E. (1998). Automatic local smoothing for spectral density estimation. Scand. J. Statist. 25 359–369.
  • Friedman, J. (1989). Regularized discriminant analysis. J. Amer. Statist. Assoc. 84 165–175.
  • Furrer, R. and Bengtsson, T. (2007). Estimation of high-dimensional prior and posteriori covariance matrices in Kalman filter variants. J. Multivariate Anal. 98 227–255.
  • Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
  • Golub, G. H. and Van Loan, C. F. (1989). Matrix Computations, 2nd ed. John Hopkins Univ. Press, Baltimore.
  • Grenander, U. and Szegö, G. (1984). Toeplitz Forms and Their Applications, 2nd ed. Chelsea Publishing Company, New York.
  • Hoerl, A. E. and Kennard, R. W. (1970). Ridge regression: Biased estimation for nonorthogonal problems. Technometrics 12 55–67.
  • Huang, J., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • Ibragimov, I. A. and Linnik, Y. V. (1971). Independent and Stationary Sequences of Random Variables. Wolters-Noordholf, Groningen.
  • James, W. and Stein, C. (1961). Estimation with quadratic loss. Proc. of 4th Berkeley Symp. Math. Statist. Probab. 1 361–380. Univ. California Press, Berkeley.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • Johnstone, I. M. and Lu, A. Y. (2007). Sparse principal components analysis. J. Amer. Statist. Assoc. To appear.
  • Kato, T. (1949). On the convergence of the perturbation method. I. Progr. Theor. Phys. 4 514–523.
  • Kato, T. (1966). Perturbation Theory for Linear Operators. Springer, Berlin.
  • Ledoit, O. and Wolf, M. (2003). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
  • Marĉenko, V. A. and Pastur, L. A. (1967). Distributions of eigenvalues of some sets of random matrices. Math. USSR-Sb. 1 507–536.
  • Meinshausen, N. and Buhlmann, P. (2006). High dimensional graphs and variable selection with the Lasso. Ann. Statist. 34 1436–1462.
  • Paul, D. (2007). Asymptotics of the leading sample eigenvalues for a spiked covariance model. Statist. Sinica. To appear.
  • Saulis, L. and Statulevičius, V. A. (1991). Limit Theorems for Large Deviations. Kluwer Academic Publishers, Dordrecht.
  • Schur, J. (1911). Bemerkungen zur theorie der beschränkten bilinearformen mit unendlich vielen veränderlichen. J. Reine Math. 140 1–28.
  • Sz.-Nagy, B. (1946). Perturbations des transformations autoadjointes dans l’espace de Hilbert. Comment. Math. Helv. 19 347–366.
  • Wachter, K. W. (1978). The strong limits of random matrix spectra for sample matrices of independent elements. Ann. Probab. 6 1–18.
  • Wu, W. B. and Pourahmadi, M. (2003). Nonparametric estimation of large covariance matrices of longitudinal data. Biometrika 90 831–844.
  • Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal components analysis. J. Comput. Graph. Statist. 15 265–286.
  • Zygmund, A. (1959). Trigonometric Series. Cambridge Univ. Press.