Electronic Journal of Statistics

High dimension low sample size asymptotics of robust PCA

Yi-Hui Zhou and J. S. Marron

Full-text: Open access

Abstract

Conventional principal component analysis is highly susceptible to outliers. In particular, a sufficiently outlying single data point, can draw the leading principal component toward itself. In this paper, we study the effects of outliers for high dimension and low sample size data, using asymptotics. The non-robust nature of conventional principal component analysis is verified through inconsistency under multivariate Gaussian assumptions with a single spike in the covariance structure, in the presence of a contaminating outlier. In the same setting, the robust method of spherical principal components is consistent with the population eigenvector for the spike model, even in the presence of contamination.

Article information

Source
Electron. J. Statist., Volume 9, Number 1 (2015), 204-218.

Dates
First available in Project Euclid: 9 February 2015

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1423491229

Digital Object Identifier
doi:10.1214/15-EJS992

Mathematical Reviews number (MathSciNet)
MR3312407

Zentralblatt MATH identifier
1307.62160

Keywords
Outlier robustness spherical PCA spike model

Citation

Zhou, Yi-Hui; Marron, J. S. High dimension low sample size asymptotics of robust PCA. Electron. J. Statist. 9 (2015), no. 1, 204--218. doi:10.1214/15-EJS992. https://projecteuclid.org/euclid.ejs/1423491229


Export citation

References

  • [1] Jolliffe, I. (2002)., Principal Component Analysis. Springer.
  • [2] Devlin, S. J. and Gnanadesikan, R. (1981). Robust estimation of dispersion matrices and principal components., JASA 76.
  • [3] Rousseeuw, P. J. (1985). Least median of squares regression., Journal of the American Statistical Association 79.
  • [4] Rousseeuw, P. J. (1985). Multivariate estimation with high breakdown point., Mathematical Statistics and Applications B.
  • [5] Li, G. and Chen, Z. (1985). Projection-pursuit approach to robust dispersion matrices and principal components: Primary theory and Monte Carlo., The Annals of Statistics 80(391).
  • [6] Locantore, N., Marron, J. S., Simpson, D. G., Tripoli, N., Zhang, J. T. and Cohen, K. L. (1999). Robust principal component analysis for functional data., Test 8(1).
  • [7] Oja, H. (2010)., Multivariate Nonparametric Methods with R. Springer.
  • [8] Jung, S. and Marron, J. S. (2009). PCA consistency in high dimension, low sample size context., The Annals of Statistics 37.
  • [9] Johnstone, I. M. and Silverman, B. W. (2004). Needles and straw in haystacks: Empirical Bayes estimates of possibly sparse sequences., Annals of Statistics 32(4).
  • [10] Hall, P., Marron, J. S. and Neeman, A. (2005). Geometric representation of high dimension, low sample size data., Journal of the Royal Statistical Society 67.
  • [11] Hampel, F. R., Ronchetti, E. M., Rousseeuw, P. J. and Stahel, W. A. (1986)., Robust Statistics: The Approach Based on Influence Functions. Wiley.
  • [12] Huber, P. J. and Ronchetti, E. M. (2009)., Robust Statistics. Wiley.
  • [13] Huber, P. (1972). The 1972 Wald Lecture Robust Statistics: A review., The Annals of Mathematical Statistics 43(4).
  • [14] Jung, S., Sen, A. and Marron, J. S. (2012). Boundary behavior in high dimension, low sample size asymptotics of PCA., The Journal of Multivariate Analysis 109.
  • [15] Yata, K. and Aoshima, M. (2010). Low-sample-size data with singular value decomposition of cross data matrix., Journal of Multivariate Analysis 101(9).