The Annals of Statistics
previous :: next

On the distribution of the largest eigenvalue in principal components analysis

Iain M. Johnstone
Source: Ann. Statist. Volume 29, Number 2 (2001), 295-327.

Abstract

Let x(1) denote the square of the largest singular value of an n × p matrix X, all of whose entries are independent standard Gaussian variates. Equivalently, x(1) is the largest principal component variance of the covariance matrix $X'X$, or the largest eigenvalue of a p­variate Wishart distribution on n degrees of freedom with identity covariance.

Consider the limit of large p and n with $n/p = \gamma \ge 1$. When centered by $\mu_p = (\sqrt{n-1} + \sqrt{p})^2$ and scaled by $\sigma_p = (\sqrt{n-1} + \sqrt{p})(1/\sqrt{n-1} + 1/\sqrt{p}^{1/3}$, the distribution of x(1) approaches the Tracey-Widom law of order 1, which is defined in terms of the Painlevé II differential equation and can be numerically evaluated and tabulated in software. Simulations show the approximation to be informative for n and p as small as 5.

The limit is derived via a corresponding result for complex Wishart matrices using methods from random matrix theory. The result suggests that some aspects of large p multivariate distribution theory may be easier to apply in practice than their fixed p counterparts.

First Page: Show Hide
Primary Subjects: 62H25, 62F20
Secondary Subjects: 33C45, 60H25
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1009210544
Digital Object Identifier: doi:10.1214/aos/1009210544
Mathematical Reviews number (MathSciNet): MR1863961
Zentralblatt MATH identifier: 1016.62078

References

Aldous, D. and Diaconis, P. (1999). Longest increasing subsequences: from patience sorting to the Baik-Deift-Johansson theorem. Bull. Amer. Math. Soc. 36 413-432.
Mathematical Reviews (MathSciNet): MR2000g:60013
Digital Object Identifier: doi:10.1090/S0273-0979-99-00796-X
Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Ann Math. Statist. 34 122-148
Mathematical Reviews (MathSciNet): MR145620
Project Euclid: euclid.aoms/1177704248
Anderson, T. W. (1996). R. A. Fisher and multivariate analysis. Statist. Sci. 11 20-34.
Zentralblatt MATH: 0955.62504
Mathematical Reviews (MathSciNet): MR1437125
Digital Object Identifier: doi:10.1214/ss/1032209662
Project Euclid: euclid.ss/1032209662
Bai, Z. D. (1999). Methodologies in spectral analysis of large dimensional random matrices: a review. Statist. Sinica 9 611-677.
Mathematical Reviews (MathSciNet): MR1711663
Zentralblatt MATH: 0949.60077
Baik, J., Deift, P. and Johansson, K. (1999). On the distribution of the length of the longest increasing subsequence of random permutations. J. Amer. Math. Soc. 12 1119-1178.
Mathematical Reviews (MathSciNet): MR2000e:05006
Zentralblatt MATH: 0932.05001
Digital Object Identifier: doi:10.1090/S0894-0347-99-00307-0
Baker, T. H., Forrester, P. J. and Pearce, P. A. (1998). Random matrix ensembles with an effective extensive external charge. J. Phys. A 31 6087-6101.
Zentralblatt MATH: 0912.15030
Mathematical Reviews (MathSciNet): MR1637735
Digital Object Identifier: doi:10.1088/0305-4470/31/29/002
Basor, E. L. (1997). Distribution functions for random variables for ensembles of positive Hermitian matrices, Comm. Math. Phys. 188 327-350.
Mathematical Reviews (MathSciNet): MR99b:82046
Zentralblatt MATH: 0905.47016
Digital Object Identifier: doi:10.1007/s002200050167
Buja, A., Hastie, T. and Tibshirani, R. (1995). Penalized discriminant analysis. Ann. Statist. 23 73-102.
Mathematical Reviews (MathSciNet): MR96g:62114
Zentralblatt MATH: 0821.62031
Digital Object Identifier: doi:10.1214/aos/1176324456
Project Euclid: euclid.aos/1176324456
Constantine, A. G. (1963). Some non-central distribution problems in multivariate analysis. Ann. Math. Statist. 34 1270-1285. Deift, P. (1999a). Integrable systems and combinatorial theory. Notices Amer. Math. Soc. 47 631-640. Deift, P. (1999b). Orthogonal Polynomials and Random Matrices: A Riemann-Hilbert Approach. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR181056
Zentralblatt MATH: 0123.36801
Digital Object Identifier: doi:10.1214/aoms/1177703863
Project Euclid: euclid.aoms/1177703863
Dunster, T. M. (1989). Uniform asymptotic expansions for Whittaker's confluent hypergeometric functions. SIAM J. Math. Anal. 20 744-760.
Mathematical Reviews (MathSciNet): MR90e:33012
Zentralblatt MATH: 0673.33003
Digital Object Identifier: doi:10.1137/0520052
Dyson, F. J. (1970). Correlations between eigenvalues of a random matrix. Comm. Math. Phys. 19 235-250.
Mathematical Reviews (MathSciNet): MR43:4398
Zentralblatt MATH: 0221.62019
Digital Object Identifier: doi:10.1007/BF01646824
Project Euclid: euclid.cmp/1103842703
Eaton, M. L. (1983). Multivariate Statistics: A Vector Space Approach. Wiley, NewYork.
Mathematical Reviews (MathSciNet): MR716321
Edelman, A. (1988). Eigenvalues and condition numbers of random matrices. SIAM J. Matrix Anal. Appl. 9 543-560.
Mathematical Reviews (MathSciNet): MR89j:15039
Zentralblatt MATH: 0678.15019
Digital Object Identifier: doi:10.1137/0609045
Edelman, A. (1991). The distribution and moments of the smallest eigenvalue of a random matrix of Wishart type. Linear Algebra Appl. 159 55-80.
Mathematical Reviews (MathSciNet): MR92i:62028
Zentralblatt MATH: 0738.15010
Digital Object Identifier: doi:10.1016/0024-3795(91)90076-9
Erd´elyi, A. (1960). Asymptotic forms for Laguerre polynomials. J. Indian Math. Soc. 24 235-250.
Mathematical Reviews (MathSciNet): MR23:A1073
Forrester, P. J. (1993). The spectrum edge of random matrix ensembles. Nuclear Phys. B 402 709-728.
Mathematical Reviews (MathSciNet): MR94h:82031
Zentralblatt MATH: 1043.82538
Digital Object Identifier: doi:10.1016/0550-3213(93)90126-A
Forrester, P. J. (2000). Painlev´e transcendent evaluation of the scaled distribution of the smallest eigenvalue in the Laguerre orthogonal and symplectic ensembles. Technical report. www.lanl.gov arXiv:nlin.SI/0005064.
Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252-261.
Mathematical Reviews (MathSciNet): MR81m:60046
Zentralblatt MATH: 0428.60039
Digital Object Identifier: doi:10.1214/aop/1176994775
Project Euclid: euclid.aop/1176994775
Gohberg, I. C. and Krein, M. G. (1969). Introduction to the Theory of Linear Non-selfadjoint Operators. Amer. Math. Soc., Providence, RI.
Mathematical Reviews (MathSciNet): MR246142
Zentralblatt MATH: 0181.13504
Hastings, S. P. and McLeod, J. B. (1980). A boundary value problem associated with the second Painlev´e transcendent and the Korteweg-de Vries equation. Arch. Rational Mech. Anal. 73 31-51.
Horn, R. A. and Johnson, C. R. (1985). Matrix Analysis. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR87e:15001
James, A. T. (1964). Distributions of matrix variates and latent roots derived from normal samples. Ann. Math. Statist. 35 475-501.
Zentralblatt MATH: 0121.36605
Mathematical Reviews (MathSciNet): MR181057
Digital Object Identifier: doi:10.1214/aoms/1177703550
Project Euclid: euclid.aoms/1177703550
Johansson, K. (1998). On fluctations of eigenvalues of random Hermitian matrices. Duke Math. J. 91 151-204.
Mathematical Reviews (MathSciNet): MR1487983
Zentralblatt MATH: 1039.82504
Digital Object Identifier: doi:10.1215/S0012-7094-98-09108-6
Project Euclid: euclid.dmj/1077231893
Johansson, K. (2000). Shape fluctuations and random matrices. Comm. Math. Phys. 209 437-476.
Mathematical Reviews (MathSciNet): MR2001h:60177
Zentralblatt MATH: 0969.15008
Digital Object Identifier: doi:10.1007/s002200050027
Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, NewYork.
Mathematical Reviews (MathSciNet): MR81h:62003
Mar cenko, V. A. and Pastur, L. A. (1967). Distributions of eigenvalues of some sets of random matrices. Math. USSR-Sb. 1 507-536.
Mathematical Reviews (MathSciNet): MR208649
Mehta, M. L. (1991). Random Matrices, 2nd ed. Academic Press, NewYork.
Mathematical Reviews (MathSciNet): MR92f:82002
Muirhead, R. J. (1974). Powers of the largest latent root test of = I. Comm. Statist. 3 513-524.
Mathematical Reviews (MathSciNet): MR52:4526
Digital Object Identifier: doi:10.1080/03610927408827154
Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, NewYork.
Mathematical Reviews (MathSciNet): MR84c:62073
Zentralblatt MATH: 0556.62028
Olver, F. W. J. (1974). Asymptotics and Special Functions. Academic Press, NewYork.
Mathematical Reviews (MathSciNet): MR55:8655
Preisendorfer, R. W. (1988). Principal Component Analysis in Meteorology and Oceanogaphy. North-Holland, Amsterdam.
Riesz, F. and Sz.-Nagy, B. (1955). Functional Analysis. Ungar, NewYork.
Mathematical Reviews (MathSciNet): MR17,175i
Soshnikov, A. (1999). Universality at the edge of the spectrum in Wigner random matrices. Comm. Math. Phys. 207 697-733.
Mathematical Reviews (MathSciNet): MR2001i:82037
Zentralblatt MATH: 1062.82502
Digital Object Identifier: doi:10.1007/s002200050743
Soshnikov, A. (2001). A note on universality of the distribution of the largest eigenvalues in certain classes of sample covariance matrices, Technical report, www.lanl.gov arXiv:math:PR/0104113.
Stein, C. (1969). Multivariate analysis I. Technical report, Dept. Statistics Stanford Univ., pages 79-81. (Notes prepared by M. L. Eaton in 1966.)
Szeg ¨o, G. (1967). Orthogonal Polynomials, 3rd ed. Amer. Math. Soc. Providence.
Mathematical Reviews (MathSciNet): MR46:9631
Temme, N. M. (1990). Asymptotic estimates for Laguerre polynomials. J. Appl. Math. Phys. (ZAMP) 41 114-126.
Mathematical Reviews (MathSciNet): MR91f:33004
Zentralblatt MATH: 0688.33007
Digital Object Identifier: doi:10.1007/BF00946078
Tracy, C. A. and Widom, H. (1994). Level-spacing distributions and the Airy kernel. Comm. Math. Phys. 159 151-174.
Mathematical Reviews (MathSciNet): MR95e:82003
Zentralblatt MATH: 0789.35152
Digital Object Identifier: doi:10.1007/BF02100489
Project Euclid: euclid.cmp/1104254495
Tracy, C. A. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Comm. Math. Phys. 177 727-754.
Mathematical Reviews (MathSciNet): MR97a:82055
Zentralblatt MATH: 0851.60101
Digital Object Identifier: doi:10.1007/BF02099545
Project Euclid: euclid.cmp/1104286442
Tracy, C. A. and Widom, H. (1998). Correlation functions, cluster functions, and spacing distributions for random matrices. J. Statis. Phys. 92 809-835.
Mathematical Reviews (MathSciNet): MR99m:82030
Zentralblatt MATH: 0942.60099
Digital Object Identifier: doi:10.1023/A:1023084324803
Tracy, C. A. and Widom, H. (1999). Airy kernel and Painlev´e II. Technical report. www.lanl.gov solv-int/9901004. To appear in CRM Proceedings and Lecture Notes: "Isomonodromic Deformations and Applications in Physics," J. Harnad, ed.
Tracy, C. A. and Widom, H. (2000). The distribution of the largest eigenvalue in the Gaussian ensembles. In Calogero-Moser-Sutherland Models (J. van Diejen and L. Vinet, eds.) 461-472. Springer, NewYork.
Mathematical Reviews (MathSciNet): MR1844228
Wachter, K. W. (1976). Probability plotting points for principal components. In Ninth Interface Symposium Computer Science and Statistics (D. Hoaglin and R. Welsch, eds.) 299-308. Prindle, Weber and Schmidt, Boston.
Widom, H. (1999). On the relation between orthogonal, symplectic and unitary ensembles. J. Statist. Phys. 94 347-363.
Mathematical Reviews (MathSciNet): MR2000e:82024
Zentralblatt MATH: 0935.60090
Digital Object Identifier: doi:10.1023/A:1004516918143
Wigner, E. P. (1955). Characteristic vectors of bordered matrices of infinite dimensions. Ann. Math. 62 548-564.
Mathematical Reviews (MathSciNet): MR17,1097c
Digital Object Identifier: doi:10.2307/1970079
Wigner, E. P. (1958). On the distribution of the roots of certain symmetric matrices. Ann. Math. 67 325-328.
Mathematical Reviews (MathSciNet): MR20:2029
Digital Object Identifier: doi:10.2307/1970008
Wilks, S. S. (1962). Mathematical Statistics. Wiley, NewYork.
Mathematical Reviews (MathSciNet): MR26:1949
previous :: next

2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?