The Annals of Statistics

Optimal shrinkage of eigenvalues in the spiked covariance model

David Donoho, Matan Gavish, and Iain Johnstone

Full-text: Open access

Abstract

We show that in a common high-dimensional covariance model, the choice of loss function has a profound effect on optimal estimation.

In an asymptotic framework based on the spiked covariance model and use of orthogonally invariant estimators, we show that optimal estimation of the population covariance matrix boils down to design of an optimal shrinker $\eta$ that acts elementwise on the sample eigenvalues. Indeed, to each loss function there corresponds a unique admissible eigenvalue shrinker $\eta^{*}$ dominating all other shrinkers. The shape of the optimal shrinker is determined by the choice of loss function and, crucially, by inconsistency of both eigenvalues and eigenvectors of the sample covariance matrix.

Details of these phenomena and closed form formulas for the optimal eigenvalue shrinkers are worked out for a menagerie of 26 loss functions for covariance estimation found in the literature, including the Stein, Entropy, Divergence, Fréchet, Bhattacharya/Matusita, Frobenius Norm, Operator Norm, Nuclear Norm and Condition Number losses.

Article information

Source
Ann. Statist., Volume 46, Number 4 (2018), 1742-1778.

Dates
Received: March 2014
Revised: May 2017
First available in Project Euclid: 27 June 2018

Permanent link to this document
https://projecteuclid.org/euclid.aos/1530086432

Digital Object Identifier
doi:10.1214/17-AOS1601

Mathematical Reviews number (MathSciNet)
MR3819116

Zentralblatt MATH identifier
06936477

Subjects
Primary: 62C20: Minimax procedures 62H25: Factor analysis and principal components; correspondence analysis
Secondary: 90C25: Convex programming 90C22: Semidefinite programming

Keywords
Covariance estimation optimal shrinkage Stein loss entropy loss divergence loss Fréchet distance Bhattacharya/Matusita affinity condition number loss high-dimensional ssymptotics spiked covariance

Citation

Donoho, David; Gavish, Matan; Johnstone, Iain. Optimal shrinkage of eigenvalues in the spiked covariance model. Ann. Statist. 46 (2018), no. 4, 1742--1778. doi:10.1214/17-AOS1601. https://projecteuclid.org/euclid.aos/1530086432


Export citation

References

  • [1] Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer Series in Statistics. Springer, New York.
  • [2] Bai, Z. and Yao, J. (2008). Central limit theorems for eigenvalues in a spiked population model. Ann. Inst. Henri Poincaré Probab. Stat. 44 447–474.
  • [3] Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
  • [4] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
  • [5] Benaych-Georges, F., Guionnet, A. and Maida, M. (2011). Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices. Electron. J. Probab. 16 1621–1662.
  • [6] Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227 494–521.
  • [7] Berger, J. (1982). Estimation in continuous exponential families: Bayesian estimation subject to risk restrictions and inadmissibility results. In Statistical Decision Theory and Related Topics, III, Vol. 1 (West Lafayette, Ind., 1981) 109–141. Academic Press, New York.
  • [8] Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
  • [9] Brown, L. D. and Greenshtein, E. (2009). Nonparametric empirical Bayes and compound decision approaches to estimation of a high-dimensional vector of normal means. Ann. Statist. 37 1685–1704.
  • [10] Cacoullos, T. and Olkin, I. (1965). On the bias of functions of characteristic roots of a random matrix. Biometrika 52 87–94.
  • [11] Chen, Y., Wiesel, A., Eldar, Y. C. and Hero, A. O. (2010). Shrinkage algorithms for MMSE covariance estimation. IEEE Trans. Signal Process. 58 5016–5029.
  • [12] Daniels, M. J. and Kass, R. E. (2001). Shrinkage estimators for covariance matrices. Biometrics 57 1173–1184.
  • [13] Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss. Ann. Statist. 13 1581–1591.
  • [14] Donoho, D. and Gavish, M. (2014). Minimax risk of matrix denoising by singular value thresholding. Ann. Statist. 42 2413–2440.
  • [15] Donoho, D., Gavish, M. and Johnstone, I. (2018). Supplement to “Optimal shrinkage of eigenvalues in the spiked covariance model.” DOI:10.1214/17-AOS1601SUPP.
  • [16] Donoho, D. L., Gavish, M. and Johnstone, I. M. (2016). Code supplement to “Optimal shrinkage of eigenvalues in the spiked covariance model”. Available at http://purl.stanford.edu/xy031gt1574.
  • [17] Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over $l_{p}$-balls for $l_{q}$-error. Probab. Theory Related Fields 99 277–303.
  • [18] Dowson, D. C. and Landau, B. V. (1982). The Fréchet distance between multivariate normal distributions. J. Multivariate Anal. 12 450–455.
  • [19] Dryden, I. L., Koloydenko, A. and Zhou, D. (2009). Non-Euclidean statistics for covariance matrices, with applications to diffusion tensor imaging. Ann. Appl. Stat. 3 1102–1123.
  • [20] Efron, B. and Morris, C. (1976). Multivariate empirical Bayes and estimation of covariance matrices. Ann. Statist. 4 22–32.
  • [21] El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • [22] El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757–2790.
  • [23] Fan, J., Fan, Y. and Lv, J. (2008). High dimensional covariance matrix estimation using a factor model. J. Econometrics 147 186–197.
  • [24] Förstner, W. and Moonen, B. (1999). A metric for covariance matrices. Quo Vadis Geodesia 113–128.
  • [25] Gavish, M. and Donoho, D. L. (2014). The optimal hard threshold for singular values is $4/\sqrt{3}$. IEEE Trans. Inform. Theory 60 5040–5053.
  • [26] Gavish, M. and Donoho, D. L. (2017). Optimal shrinkage of singular values. IEEE Trans. Inform. Theory 63 2137–2152.
  • [27] Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
  • [28] Gupta, A. K. and Ofori-Nyarko, S. (1995). Improved minimax estimators of normal covariance and precision matrices. Statistics 26 19–25.
  • [29] Haff, L. R. (1979). Estimation of the inverse covariance matrix: Random mixtures of the inverse Wishart matrix and the identity. Ann. Statist. 7 1264–1276.
  • [30] Haff, L. R. (1979). An identity for the Wishart distribution with applications. J. Multivariate Anal. 9 531–544.
  • [31] Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statist. 8 586–597.
  • [32] Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • [33] James, A. T. (1954). Normal multivariate analysis and the orthogonal group. Ann. Math. Stat. 25 40–75.
  • [34] James, W. and Stein, C. (1961). Estimation with quadratic loss. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob., Vol. I 361–379. Univ. California Press, Berkeley, CA.
  • [35] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [36] Johnstone, I. M. (2018). Tail sums of Wishart and GUE eigenvalues beyond the bulk edge. Australian and New Zealand Journal of Statistics. To appear. Available at ArXiv:1704.06398.
  • [37] Kailath, T. (1967). The divergence and Bhattacharyya distance measures in signal selection. IEEE Trans. Commun. Technol. 15 52–60.
  • [38] Konno, Y. (1991). On estimation of a matrix of normal means with unknown covariance matrix. J. Multivariate Anal. 36 44–55.
  • [39] Krishnamoorthy, K. and Gupta, A. K. (1989). Improved minimax estimation of a normal precision matrix. Canad. J. Statist. 17 91–102.
  • [40] Krishnamoorthy, K. and Gupta, A. K. (1989). Improved minimax estimation of a normal precision matrix. Canad. J. Statist. 17 91–102.
  • [41] Kritchman, S. and Nadler, B. (2009). Non-parametric detection of the number of signals: Hypothesis testing and random matrix theory. IEEE Trans. Signal Process. 57 3930–3941.
  • [42] Kubokawa, T. (1989). Improved estimation of a covariance matrix under quadratic loss. Statist. Probab. Lett. 8 69–71.
  • [43] Kubokawa, T. and Konno, Y. (1990). Estimating the covariance matrix and the generalized variance under a symmetric loss. Ann. Inst. Statist. Math. 42 331–343.
  • [44] Ledoit, O. and Péché, S. (2011). Eigenvectors of some large sample covariance matrix ensembles. Probab. Theory Related Fields 151 233–264.
  • [45] Ledoit, O. and Wolf, M. (2004). A well-conditioned estimator for large-dimensional covariance matrices. J. Multivariate Anal. 88 365–411.
  • [46] Ledoit, O. and Wolf, M. (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40 1024–1060.
  • [47] Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
  • [48] Lenglet, C., Rousson, M., Deriche, R. and Faugeras, O. (2006). Statistics on the manifold of multivariate normal distributions: Theory and application to diffusion tensor MRI processing. J. Math. Imaging Vision 25 423–444.
  • [49] Lin, S. P. and Perlman, M. D. (1985). A Monte Carlo comparison of four estimators of a covariance matrix. In Multivariate Analysis VI (Pittsburgh, PA, 1983) 411–429. North-Holland, Amsterdam.
  • [50] Loh, W.-L. (1991). Estimating covariance matrices. Ann. Statist. 19 283–296.
  • [51] Marčenko, V. A. and Pastur, L. A. (1967). Distribution of eigenvalues for some sets of random matrices. Sb. Math. 1 457–483.
  • [52] Matusita, K. (1967). On the notion of affinity of several distributions and some of its applications. Ann. Inst. Statist. Math. 19 181–192.
  • [53] Muirhead, R. J. (1987). Developments in eigenvalue estimation. In Advances in Multivariate Statistical Analysis. Theory Decis. Lib. Ser. B: Math. Statist. Methods 277–288. Reidel, Dordrecht.
  • [54] NIST Digital Library of Mathematical Functions. Available at http://dlmf.nist.gov/, Release 1.0.9 of 2014-08-29. Online companion to [56].
  • [55] Olkin, I. and Pukelsheim, F. (1982). The distance between two random vectors with given dispersion matrices. Linear Algebra Appl. 48 257–263.
  • [56] Olver, F. W. J., Lozier, D. W., Boisvert, R. F. and Clark, C. W., eds. (2010). NIST Handbook of Mathematical Functions. Cambridge University Press, New York, NY. Print companion to [54].
  • [57] Pal, N. (1993). Estimating the normal dispersion matrix and the precision matrix from a decision-theoretic point of view: A review. Statist. Papers 34 1–26.
  • [58] Passemier, D. and Yao, J. (2013). Variance estimation and goodness-of-fit test in a high-dimensional strict factor model. Available at ArXiv:1308.3890.
  • [59] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • [60] Selliah, J. B. (1964). Estimation and Testing Problems in a Wishart Distribution. ProQuest LLC, Ann Arbor, MI. Thesis (Ph.D.)—Stanford Univ.
  • [61] Shabalin, A. A. and Nobel, A. B. (2013). Reconstruction of a low-rank matrix in the presence of Gaussian noise. J. Multivariate Anal. 118 67–76.
  • [62] Sharma, D. and Krishnamoorthy, K. (1985). Empirical Bayes estimators of normal covariance matrix. Sankhya, Ser. A 47 247–254.
  • [63] Sinha, B. K. and Ghosh, M. (1987). Inadmissibility of the best equivariant estimators of the variance-covariance matrix, the precision matrix, and the generalized variance under entropy loss. Statist. Decisions 5 201–227.
  • [64] Stein, C. (1956). Some problems in multivariate analysis. Technical Report, Department of Statistics, Stanford Univ., Available at http://statistics.stanford.edu/~ckirby/techreports/ONR/CHE%20ONR%2006.pdf.
  • [65] Stein, C. (1986). Lectures on the theory of estimation of many parameters. J. Math. Sci. 34 1373–1403.
  • [66] Sun, D. and Sun, X. (2005). Estimation of the multivariate normal precision and covariance matrices in a star-shape model. Ann. Inst. Statist. Math. 57 455–484.
  • [67] Tracy, C. A. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Comm. Math. Phys. 177 727–754.
  • [68] van der Vaart, H. R. (1961). On certain characteristics of the distribution of the latent roots of asymmetric random matrix under general conditions. Ann. Math. Stat. 32 864–873.
  • [69] Won, J.-H., Lim, J., Kim, S.-J. and Rajaratnam, B. (2013). Condition-number-regularized covariance estimation. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 427–450.
  • [70] Yang, R. and Berger, J. O. (1994). Estimation of a covariance matrix using the reference prior. Ann. Statist. 22 1195–1211.

Supplemental materials

  • Proofs and additional results. In the supplementary material, we provide proofs omitted from the main text for space considerations and auxiliary lemmas used in various proofs. Notably, we prove Lemma 4, and provide detailed derivations of the 17 explicit formulas for optimal shrinkers, as summarized in Table 2. In addition, in the supplementary material we offer a detailed study of the large-$\lambda$ asymptotics (asymptotic slope and asymptotic shift) of the optimal shrinkers discovered in this paper, and tabulate the asymptotic behavior of each optimal shrinker. We also study the asymptotic percent improvement of the optimal shrinkers over naive hard thresholding of the sample covariance eigenvalues.