The Annals of Statistics

Sparse principal component analysis and iterative thresholding

Zongming Ma

Full-text: Open access


Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features $p$ is comparable to, or even much larger than, the sample size $n$. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.

Article information

Ann. Statist., Volume 41, Number 2 (2013), 772-801.

First available in Project Euclid: 8 May 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H12: Estimation
Secondary: 62G20: Asymptotic properties 62H25: Factor analysis and principal components; correspondence analysis

Dimension reduction high-dimensional statistics principal component analysis principal subspace sparsity spiked covariance model thresholding


Ma, Zongming. Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 (2013), no. 2, 772--801. doi:10.1214/13-AOS1097.

Export citation


  • [1] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
  • [2] Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Ann. Math. Statist. 34 122–148.
  • [3] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic).
  • [4] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1–46.
  • [5] Donoho, D. L. (1993). Unconditional bases are optimal bases for data compression and for statistical estimation. Appl. Comput. Harmon. Anal. 1 100–115.
  • [6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
  • [7] Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD.
  • [8] Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24 417–441, 498–520.
  • [9] Hoyle, D. C. and Rattray, M. (2004). Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure. Phys. Rev. E (3) 69 026124.
  • [10] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [11] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
  • [12] Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547.
  • [13] Jung, S. and Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. Ann. Statist. 37 4104–4130.
  • [14] Lu, A. Y. (2002). Sparse principal component analysis for functional data. Ph.D. thesis, Stanford Univ., Stanford, CA.
  • [15] Ma, Z. (2013). Supplement to “Sparse principal component analysis and iterative thresholding.” DOI:10.1214/13-AOS1097SUPP.
  • [16] Mallat, S. (2009). A Wavelet Tour of Signal Processing: The Sparse Way. Academic Press, New York.
  • [17] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817.
  • [18] Nadler, B. (2009). Discussion of “On consistency and sparsity for principal components analysis in high dimensions,” by I. M. Johnstone and A. Y. Lu. J. Amer. Statist. Assoc. 104 694–697.
  • [19] Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly influential factors. J. Econometrics 168 244–258.
  • [20] Paul, D. (2005). Nonparametric estimation of principal components. Ph.D. thesis, Stanford Univ.
  • [21] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • [22] Paul, D. and Johnstone, I. M. (2007). Augmented sparse principal component analysis for high dimensional data. Available at arXiv:1202.1242v1.
  • [23] Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philos. Mag. Ser. 6 2 559–572.
  • [24] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York.
  • [25] Reimann, P., Van den Broeck, C. and Bex, G. J. (1996). A Gaussian scenario for unsupervised learning. J. Phys. A 29 3521–3535.
  • [26] Shen, D., Shen, H. and Marron, J. S. (2011). Consistency of sparse PCA in high dimension, low sample size contexts.
  • [27] Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015–1034.
  • [28] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA.
  • [29] Tsay, R. S. (2005). Analysis of Financial Time Series, 2nd ed. Wiley, Hoboken, NJ.
  • [30] Ulfarsson, M. O. and Solo, V. (2008). Sparse variable PCA using geodesic steepest descent. IEEE Trans. Signal Process. 56 5823–5832.
  • [31] Varmuza, K. and Filzmoser, P. (2009). Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL.
  • [32] Wax, M. and Kailath, T. (1985). Detection of signals by information theoretic criteria. IEEE Trans. Acoust. Speech Signal Process. 33 387–392.
  • [33] Wedin, P.- Å. (1972). Perturbation bounds in connection with singular value decomposition. Nordisk Tidskr. Informationsbehandling (BIT) 12 99–111.
  • [34] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
  • [35] Yuan, X. T. and Zhang, T. (2011). Truncated power method for sparse eigenvalue problems. Available at arXiv:1112.2679v1.
  • [36] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.

Supplemental materials

  • Supplementary material: Supplement to “Sparse principal component analysis and iterative thresholding”. We give in the supplement proofs to Corollaries 3.1 and 3.2, Proposition 3.1 and all the claims in Section 6.