The Annals of Statistics

Nonparametric eigenvalue-regularized precision or covariance matrix estimator

Clifford Lam

Full-text: Open access

Abstract

We introduce nonparametric regularization of the eigenvalues of a sample covariance matrix through splitting of the data (NERCOME), and prove that NERCOME enjoys asymptotic optimal nonlinear shrinkage of eigenvalues with respect to the Frobenius norm. One advantage of NERCOME is its computational speed when the dimension is not too large. We prove that NERCOME is positive definite almost surely, as long as the true covariance matrix is so, even when the dimension is larger than the sample size. With respect to the Stein’s loss function, the inverse of our estimator is asymptotically the optimal precision matrix estimator. Asymptotic efficiency loss is defined through comparison with an ideal estimator, which assumed the knowledge of the true covariance matrix. We show that the asymptotic efficiency loss of NERCOME is almost surely 0 with a suitable split location of the data. We also show that all the aforementioned optimality holds for data with a factor structure. Our method avoids the need to first estimate any unknowns from a factor model, and directly gives the covariance or precision matrix estimator, which can be useful when factor analysis is not the ultimate goal. We compare the performance of our estimators with other methods through extensive simulations and real data analysis.

Article information

Source
Ann. Statist., Volume 44, Number 3 (2016), 928-953.

Dates
Received: January 2015
Revised: September 2015
First available in Project Euclid: 11 April 2016

Permanent link to this document
https://projecteuclid.org/euclid.aos/1460381682

Digital Object Identifier
doi:10.1214/15-AOS1393

Mathematical Reviews number (MathSciNet)
MR3485949

Zentralblatt MATH identifier
1341.62124

Subjects
Primary: 62H12: Estimation
Secondary: 62G20: Asymptotic properties 15B52: Random matrices

Keywords
High dimensional data analysis covariance matrix Stieltjes transform data splitting nonlinear shrinkage factor model

Citation

Lam, Clifford. Nonparametric eigenvalue-regularized precision or covariance matrix estimator. Ann. Statist. 44 (2016), no. 3, 928--953. doi:10.1214/15-AOS1393. https://projecteuclid.org/euclid.aos/1460381682


Export citation

References

  • Abadir, K. M., Distaso, W. and Žikeš, F. (2014). Design-free estimation of variance matrices. J. Econometrics 181 165–180.
  • Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer, New York.
  • Bai, Z. D. and Yin, Y. Q. (1993). Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 21 1275–1294.
  • Bickel, P. J. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Bickel, P. J. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
  • Cai, T. T. and Zhou, H. H. (2012). Optimal rates of convergence for sparse covariance matrix estimation. Ann. Statist. 40 2389–2420.
  • Demiguel, V. and Nogales, F. J. (2009). A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Management Science 55 798–812.
  • Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186–197.
  • Fan, J., Liao, Y. and Mincheva, M. (2011). High-dimensional covariance matrix estimation in approximate factor models. Ann. Statist. 39 3320–3356.
  • Fan, J., Liao, Y. and Mincheva, M. (2013). Large covariance estimation by thresholding principal orthogonal complements. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 603–680.
  • Friedman, J., Hastie, T. and Tibshirani, R. (2008). Sparse inverse covariance estimation with the graphical lasso. Biostatistics 9 432–441.
  • Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. Contributions to the Theory of Statistics 1 361–379. Univ. California Press, Berkeley, CA.
  • Lam, C. (2015). Supplement to “Nonparametric eigenvalue-regularized precision or covariance matrix estimator.” DOI:10.1214/15-AOS1393SUPP.
  • Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
  • Lam, C. and Yao, Q. (2012). Factor modeling for high-dimensional time series: Inference for the number of factors. Ann. Statist. 40 694–726.
  • Lam, C., Yao, Q. and Bathia, N. (2011). Estimation of latent factors for high-dimensional time series. Biometrika 98 901–918.
  • Ledoit, O. and Péché, S. (2011). Eigenvectors of some large sample covariance matrix ensembles. Probab. Theory Related Fields 151 233–264.
  • Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
  • Ledoit, O. and Wolf, M. (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40 1024–1060.
  • Ledoit, O. and Wolf, M. (2013a). Optimal estimation of a large-dimensional covariance matrix under Stein’s loss. ECON—Working Papers 122, Dept. Economics, Univ. Zürich.
  • Ledoit, O. and Wolf, M. (2013b). Spectrum estimation: A unified framework for covariance matrix estimation and PCA in large dimensions. ECON—Working Papers 105, Dept. Economics, Univ. Zürich.
  • Marčenko, V. and Pastur, L. (1967). Distribution of eigenvalues for some sets of random matrices. Math. USSR-Sb 1 457–483.
  • Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
  • Pourahmadi, M. (2007). Cholesky decompositions and estimation of a covariance matrix: Orthogonality of variance-correlation parameters. Biometrika 94 1006–1013.
  • Rothman, A. J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104 177–186.
  • Silverstein, J. W. and Choi, S.-I. (1995). Analysis of the limiting spectral distribution of large-dimensional random matrices. J. Multivariate Anal. 54 295–309.
  • Stein, C. (1975). Estimation of a covariance matrix. In Rietz Lecture, 39th Annual Meeting IMS. Atlanta, GA.
  • Stein, C. (1986). Lectures on the theory of estimation of many parameters. J. Sov. Math. 34 1373–1403.
  • Stock, J. and Watson, M. (2005). Implications of dynamic factor models for var analysis. NBER working papers. No. 11467.
  • Won, J.-H., Lim, J., Kim, S.-J. and Rajaratnam, B. (2013). Condition-number-regularized covariance estimation. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 427–450.

Supplemental materials

  • Simulations and proofs of theorems in the paper. We present five profiles of simulations and compare the performance of NERCOME to other state-of-the-art methods. We also present the proofs of Theorem 1, Theorem 3, Lemma 1, Theorem 5 and Theorem 6 in the paper.