Open Access
August 2017 Sharp detection in PCA under correlations: All eigenvalues matter
Edgar Dobriban
Ann. Statist. 45(4): 1810-1833 (August 2017). DOI: 10.1214/16-AOS1514

Abstract

Principal component analysis (PCA) is a widely used method for dimension reduction. In high-dimensional data, the “signal” eigenvalues corresponding to weak principal components (PCs) do not necessarily separate from the bulk of the “noise” eigenvalues. Therefore, popular tests based on the largest eigenvalue have little power to detect weak PCs. In the special case of the spiked model, certain tests asymptotically equivalent to linear spectral statistics (LSS)—averaging effects over all eigenvalues—were recently shown to achieve some power.

We consider a “local alternatives” model for the spectrum of covariance matrices that allows a general correlation structure. We develop new tests to detect PCs in this model. While the top eigenvalue contains little information, due to the strong correlations between the eigenvalues we can detect weak PCs by averaging over all eigenvalues using LSS. We show that it is possible to find the optimal LSS, by solving a certain integral equation. To solve this equation, we develop efficient algorithms that build on our recent method for computing the limit empirical spectrum [Dobriban (2015)]. The solvability of this equation also presents a new perspective on phase transitions in spiked models.

Citation

Download Citation

Edgar Dobriban. "Sharp detection in PCA under correlations: All eigenvalues matter." Ann. Statist. 45 (4) 1810 - 1833, August 2017. https://doi.org/10.1214/16-AOS1514

Information

Received: 1 February 2016; Revised: 1 August 2016; Published: August 2017
First available in Project Euclid: 28 June 2017

zbMATH: 06773292
MathSciNet: MR3670197
Digital Object Identifier: 10.1214/16-AOS1514

Subjects:
Primary: 62H25
Secondary: 45B05 , 62H15

Keywords: linear integral equation , linear spectral statistic , optimal testing , Principal Component Analysis , Random matrix theory

Rights: Copyright © 2017 Institute of Mathematical Statistics

Vol.45 • No. 4 • August 2017
Back to Top