The Annals of Statistics

Statistical eigen-inference from large Wishart matrices

N. Raj Rao, James A. Mingo, Roland Speicher, and Alan Edelman

Full-text: Open access

Abstract

We consider settings where the observations are drawn from a zero-mean multivariate (real or complex) normal distribution with the population covariance matrix having eigenvalues of arbitrary multiplicity. We assume that the eigenvectors of the population covariance matrix are unknown and focus on inferential procedures that are based on the sample eigenvalues alone (i.e., “eigen-inference”).

Results found in the literature establish the asymptotic normality of the fluctuation in the trace of powers of the sample covariance matrix. We develop concrete algorithms for analytically computing the limiting quantities and the covariance of the fluctuations. We exploit the asymptotic normality of the trace of powers of the sample covariance matrix to develop eigenvalue-based procedures for testing and estimation. Specifically, we formulate a simple test of hypotheses for the population eigenvalues and a technique for estimating the population eigenvalues in settings where the cumulative distribution function of the (nonrandom) population eigenvalues has a staircase structure.

Monte Carlo simulations are used to demonstrate the superiority of the proposed methodologies over classical techniques and the robustness of the proposed techniques in high-dimensional, (relatively) small sample size settings. The improved performance results from the fact that the proposed inference procedures are “global” (in a sense that we describe) and exploit “global” information thereby overcoming the inherent biases that cripple classical inference procedures which are “local” and rely on “local” information.

Article information

Source
Ann. Statist., Volume 36, Number 6 (2008), 2850-2885.

Dates
First available in Project Euclid: 5 January 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1231165187

Digital Object Identifier
doi:10.1214/07-AOS583

Mathematical Reviews number (MathSciNet)
MR2485015

Zentralblatt MATH identifier
1168.62056

Subjects
Primary: 62510 62E20: Asymptotic distribution theory 15A52

Keywords
Sample covariance matrices random matrix theory Wishart matrices second order freeness free probability eigen-inference linear statistics

Citation

Rao, N. Raj; Mingo, James A.; Speicher, Roland; Edelman, Alan. Statistical eigen-inference from large Wishart matrices. Ann. Statist. 36 (2008), no. 6, 2850--2885. doi:10.1214/07-AOS583. https://projecteuclid.org/euclid.aos/1231165187


Export citation

References

  • Anderson, G. W. and Zeitouni, O. (2006). A CLT for a band matrix model. Probab. Theory Related Fields 134 283–338.
  • Anderson, T. W. (1963). Asymptotic theory of principal component analysis. Ann. Math. Statist. 34 122–248.
  • Bai, Z. D. and Silverstein, J. W. (1998). No eigenvalues outside the support of the limiting spectral distribution of large-dimensional sample covariance matrices. Ann. Probab. 26 316–345.
  • Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab. 32 553–605.
  • Bai, Z. D. and Silverstein, J. W. (2006). Spectral Analysis of Large Dimensional Random Matrices. Science Press, Beijing.
  • Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
  • Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
  • Butler, R. W. and Wood, A. T. A. (2002). Laplace approximations for hypergeometric functions with matrix argument. Ann. Statist. 30 1155–1177.
  • Butler, R. W. and Wood, A. T. A. (2005). Laplace approximations to hypergeometric functions of two matrix arguments. J. Multivariate Anal. 94 1–18.
  • Collins, B., Mingo, J., Śniady, P. and Speicher, R. (2007). Second order freeness and fluctuations of random matrices. III. Higher order freeness and free cumulants. Doc. Math. 12 1–70.
  • Dey, D. K. and Srinivasan, C. (1985). Estimation of a covariance matrix under Stein’s loss. Ann. Statist. 13 1581–1591.
  • Dumitriu, I., Edelman, A. and Shuman, G. (2007). MOPS: Multivariate orthogonal polynomials (symbolically). Symbolic Comput. 42 587–620.
  • Dumitriu, I. and Rassart, E. (2003). Path counting and random matrix theory. Electron. J. Combin. 7 R-43.
  • El Karoui, N. (2006). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Available at http://arxiv.org/abs/math.ST/0609418.
  • El Karoui, N. (2007). Tracy-Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663–714.
  • Haff, L. R. (1980). Empirical Bayes estimation of the multivariate normal covariance matrix. Ann. Statist. 8 586–597.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 30 1081–1102.
  • Mingo, J. A. and Speicher, R. (2006). Second order freeness and fluctuations of random matrices. I. Gaussian and Wishart matrices and cyclic Fock spaces. J. Funct. Anal. 235 226–270.
  • Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.
  • Nadakuditi, R. R. (2007). Applied stochastic eigen-analysis. Ph.D. dissertation, Massachusetts Institute of Technology, Dept. Electrical Engineering and Computer Science.
  • Nica, A. and Speicher, R. (2006). Lectures on the Combinatorics of Free Probability. London Mathematical Society Lecture Note Series 335. Cambridge Univ. Press.
  • Paul, D. (2005). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • Rao, N. R. (2006). RMTool: A random matrix calculator in MATLAB. Available online at http://www.mit.edu/~raj/rmtool.
  • Rao, N. R. and Edelman, A. (2006). Free probability, sample covariance matrices and signal processing. In Proceedings of ICASSP 5 V–1001–V–1004.
  • Silverstein, J. W. and Combettes, J. W. (1992). Signal detection via spectral theory of large dimensional random matrices. IEEE Trans. Signal Process. 40 2100–2105. No. 8.
  • Smith, S. T. (2005). Covariance, subspace, and intrinsic Cramér–Rao bounds. IEEE Trans. Signal Process. 53 1610–1630. No. 5.
  • Srivastava, M. S. (2005). Some tests concerning the covariance matrix in high-dimensional data. J. Japan Statist. Soc. 35 251–272.
  • Srivastava, M. S. (2006). Some tests criteria for the covariance matrix with fewer observations than the dimension. Acta Comment. Univ. Tartu. Math. 10 77–93.
  • Srivastava, M. S. (2007). Multivariate theory for analyzing high-dimensional data. J. Japan Statis. Soc. 37 53–86.
  • Tracy, C. and Widom, H. (1994). Level-spacing distributions and the Airy kernel. Commun. Math. Phys. 159 151–174.
  • Tracy, C. A. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Commun. Math. Phys. 177 727–754.
  • Van Trees, H. L. (2002). Detection, Estimation, and Modulation Theory. Part IV. Optimum Array Processing. Wiley, New York.
  • Wishart, J. (1928). The generalized product moment distribution in samples from a normal multivariate population. Biometrika 20 32–52.