The Annals of Statistics

Finite sample approximation results for principal component analysis: A matrix perturbation approach

Boaz Nadler

Full-text: Open access

Abstract

Principal component analysis (PCA) is a standard tool for dimensional reduction of a set of n observations (samples), each with p variables. In this paper, using a matrix perturbation approach, we study the nonasymptotic relation between the eigenvalues and eigenvectors of PCA computed on a finite sample of size n, and those of the limiting population PCA as n→∞. As in machine learning, we present a finite sample theorem which holds with high probability for the closeness between the leading eigenvalue and eigenvector of sample PCA and population PCA under a spiked covariance model. In addition, we also consider the relation between finite sample PCA and the asymptotic results in the joint limit p, n→∞, with p/n=c. We present a matrix perturbation view of the “phase transition phenomenon,” and a simple linear-algebra based derivation of the eigenvalue and eigenvector overlap in this asymptotic limit. Moreover, our analysis also applies for finite p, n where we show that although there is no sharp phase transition as in the infinite case, either as a function of noise level or as a function of sample size n, the eigenvector of sample PCA may exhibit a sharp “loss of tracking,” suddenly losing its relation to the (true) eigenvector of the population PCA matrix. This occurs due to a crossover between the eigenvalue due to the signal and the largest eigenvalue due to noise, whose eigenvector points in a random direction.

Article information

Source
Ann. Statist. Volume 36, Number 6 (2008), 2791-2817.

Dates
First available in Project Euclid: 5 January 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1231165185

Digital Object Identifier
doi:10.1214/08-AOS618

Mathematical Reviews number (MathSciNet)
MR2485013

Zentralblatt MATH identifier
1168.62058

Subjects
Primary: 62H25: Factor analysis and principal components; correspondence analysis 62E17: Approximations to distributions (nonasymptotic)
Secondary: 15A42: Inequalities involving eigenvalues and eigenvectors

Keywords
Principal component analysis spiked covariance model random matrix theory matrix perturbation phase transition

Citation

Nadler, Boaz. Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 (2008), no. 6, 2791--2817. doi:10.1214/08-AOS618. https://projecteuclid.org/euclid.aos/1231165185


Export citation

References

  • [1] Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Ann. Math. Statist. 34 122–148.
  • [2] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley, New York.
  • [3] Baik, J., Ben Arous, G. and Peche, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
  • [4] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
  • [5] Biehl, M. and Mietzner, A. (1994). Statistical-mechanics of unsupervised structure recognition. J. Phys. A 27 1885.
  • [6] Buckheit, J. and Donoho, D. L. (1995). Improved linear discrimination using time frequency dictionaries. Proc. SPIE 2569 540–551.
  • [7] Davidson, K. R. and Szarek, S. (2001). Local operator theory, random matrices and Banach spaces. In Handbook on the Geometry of Banach Spaces (W. B. Johnson and J. Lindenstrauss, eds.) 1 317–366. North-Holland, Amsterdam.
  • [8] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 70 1–47.
  • [9] Dozier, R. B. and Silverstein, J. W. (2007). On the empirical distribution of eigenvalues of large-dimensional information-plus-noise type matrices. J. Multivariate Anal. 98 678–694.
  • [10] Eaton, M. and Tyler, D. E. (1991). On Wielandt’s inequality and its application to the asymptotic distribution of the eigenvalues of a random symmetric matrix. Ann. Statist. 19 260–271.
  • [11] El Karoui, N. (2007). Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663–714.
  • [12] Girshick, M. A. (1939). On the sampling theory of the roots of determinantal equations. Ann. Math. Statist. 10 203–204.
  • [13] Horn, R. A. and Johnson, C. R. (1990). Matrix Analysis. Cambridge Univ. Press.
  • [14] Hoyle, D. C. and Rattray, M. (2003). PCA learning for sparse high-dimensional data, Europhys. Lett. 62 117–123.
  • [15] Hyvarinen, A., Karhunen, J. and Oja, E. (2001). Independent Component Analysis. Wiley, New York.
  • [16] Ipsen, I. C. F. and Nadler, B. (2008). Refined perturbation bounds for eigenvalues of Hermitian and non-Hermitian matrices. SIAM J. Matrix Anal. Appl. To appear.
  • [17] Jackson, J. D. (1991). A User’s Guide to Principal Components. Wiley.
  • [18] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [19] Johnstone, I. M. and Lu, A. Y. (2008). Sparse principal components analysis. J. Amer. Statist. Assoc. To appear.
  • [20] Jolliffe, I. T. (2002). Principal Component Analysis, 2nd ed. Springer, New York.
  • [21] Kato, T. (1995). Perturbation Theory for Linear Operators, 2nd ed. Springer, Berlin.
  • [22] Kritchman, S. and Nadler, B. (2008). Determining the number of components in a factor model from limited noisy data. Chemom. Int. Lab. Sys. 94 19–32.
  • [23] Lawley, D. N. (1956). Tests of significance for the latent roots of covariance and correlation matrices. Biometrika 43 128–136.
  • [24] Marčenko, V. A. and Pastur, L. A. (1967). Distribution for some sets of random matrices. Math. USSR-Sb 1 457–483.
  • [25] Nadler, B. and Coifman, R. R. (2005). Partial least squares, Beer’s law and the net analyte signal: Statistical modeling and analysis. J. Chemometrics 19 45–54.
  • [26] Nadler, B. and Coifman, R. R. (2005). The prediction error in CLS and PLS: The importance of feature selection prior to multivariate calibration. J. Chemom. 19 107–118.
  • [27] Naes, T., Isaksson, T., Fearn, T. and Davis, T. (2002). Multivariate Calibration and Classification. NIR Publications, Chichester, UK.
  • [28] O’Leary, D. P. and Stewart, G. W. (1990). Computing the eigenvalues and eigenvectors of symmetric arrowhead matrices. J. Comput. Phys. 90 497–505.
  • [29] Onatski, A. (2007). Asymptotic distribution of the principal components estimator of large factor models when factors are relatively weak. Available at http://www.columbia.edu/~ao2027/inference45.pdf.
  • [30] Parlett, B. N. (1980). The Symmetric Eigenvalue Problem. Prentice-Hall, Englewood Cliffs, NJ.
  • [31] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • [32] Reimann, P., Van den Broeck, C. and Bex, G. J. (1996). A Gaussian scenario for unsupervised learning. J. Phys. A 29 3521–3535.
  • [33] Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. J. Multivariate Anal. 54 175–192.
  • [34] Silverstein, J. W. and Choi, S. I. (1995). Analysis of the limiting spectral distribution of large-dimensional random matrices. J. Multivariate Anal. 54 295–309.
  • [35] Stewart, G. W. (1990). Stochastic perturbation theory. SIAM Rev. 32 579–610.
  • [36] Stewart, G. W. (1991). Perturbation theory for the singular value decomposition. In SVD and Signal Processing II: Algorithms, Analysis and Applications (R. J. Vaccaro, ed.) 99–109.
  • [37] Watkin, T. H. and Nadal, J.-P. (1994). Optimal unsupervised learning. J. Phys. A 27 1899–1915.