Sparse principal component analysis and iterative thresholding

Zongming Ma

doi:10.1214/13-AOS1097

April 2013 Sparse principal component analysis and iterative thresholding

Zongming Ma

Ann. Statist. 41(2): 772-801 (April 2013). DOI: 10.1214/13-AOS1097

Abstract

Principal component analysis (PCA) is a classical dimension reduction method which projects data onto the principal subspace spanned by the leading eigenvectors of the covariance matrix. However, it behaves poorly when the number of features $p$ is comparable to, or even much larger than, the sample size $n$. In this paper, we propose a new iterative thresholding approach for estimating principal subspaces in the setting where the leading eigenvectors are sparse. Under a spiked covariance model, we find that the new approach recovers the principal subspace and leading eigenvectors consistently, and even optimally, in a range of high-dimensional sparse settings. Simulated examples also demonstrate its competitive performance.

References

1.

[1] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921. MR2541450 1173.62049 10.1214/08-AOS664 euclid.aos/1247836672 [1] Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921. MR2541450 1173.62049 10.1214/08-AOS664 euclid.aos/1247836672

2.

[2] Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Ann. Math. Statist. 34 122–148. MR145620 10.1214/aoms/1177704248 euclid.aoms/1177704248 [2] Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Ann. Math. Statist. 34 122–148. MR145620 10.1214/aoms/1177704248 euclid.aoms/1177704248

3.

[3] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic). MR2353806 1128.90050 10.1137/050645506[3] d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic). MR2353806 1128.90050 10.1137/050645506

4.

[4] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1–46. MR264450 0198.47201 10.1137/0707001[4] Davis, C. and Kahan, W. M. (1970). The rotation of eigenvectors by a perturbation. III. SIAM J. Numer. Anal. 7 1–46. MR264450 0198.47201 10.1137/0707001

5.

[5] Donoho, D. L. (1993). Unconditional bases are optimal bases for data compression and for statistical estimation. Appl. Comput. Harmon. Anal. 1 100–115. MR1256530 10.1006/acha.1993.1008[5] Donoho, D. L. (1993). Unconditional bases are optimal bases for data compression and for statistical estimation. Appl. Comput. Harmon. Anal. 1 100–115. MR1256530 10.1006/acha.1993.1008

6.

[6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360. MR1946581 1073.62547 10.1198/016214501753382273[6] Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360. MR1946581 1073.62547 10.1198/016214501753382273

7.

[7] Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD. MR1417720[7] Golub, G. H. and Van Loan, C. F. (1996). Matrix Computations, 3rd ed. Johns Hopkins Univ. Press, Baltimore, MD. MR1417720

8.

[8] Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24 417–441, 498–520.[8] Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 24 417–441, 498–520.

9.

[9] Hoyle, D. C. and Rattray, M. (2004). Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure. Phys. Rev. E (3) 69 026124.[9] Hoyle, D. C. and Rattray, M. (2004). Principal-component-analysis eigenvalue spectra from data with symmetry-breaking structure. Phys. Rev. E (3) 69 026124.

10.

[10] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327. MR1863961 1016.62078 10.1214/aos/1009210544 euclid.aos/1009210544 [10] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327. MR1863961 1016.62078 10.1214/aos/1009210544 euclid.aos/1009210544

11.

[11] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693. MR2751448 10.1198/jasa.2009.0121[11] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693. MR2751448 10.1198/jasa.2009.0121

12.

[12] Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547. MR2002634 10.1198/1061860032148[12] Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547. MR2002634 10.1198/1061860032148

13.

[13] Jung, S. and Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. Ann. Statist. 37 4104–4130. MR2572454 1191.62108 10.1214/09-AOS709 euclid.aos/1256303538 [13] Jung, S. and Marron, J. S. (2009). PCA consistency in high dimension, low sample size context. Ann. Statist. 37 4104–4130. MR2572454 1191.62108 10.1214/09-AOS709 euclid.aos/1256303538

14.

[14] Lu, A. Y. (2002). Sparse principal component analysis for functional data. Ph.D. thesis, Stanford Univ., Stanford, CA. MR2703298[14] Lu, A. Y. (2002). Sparse principal component analysis for functional data. Ph.D. thesis, Stanford Univ., Stanford, CA. MR2703298

15.

[15] Ma, Z. (2013). Supplement to “Sparse principal component analysis and iterative thresholding.” DOI:10.1214/13-AOS1097SUPP.[15] Ma, Z. (2013). Supplement to “Sparse principal component analysis and iterative thresholding.” DOI:10.1214/13-AOS1097SUPP.

16.

[16] Mallat, S. (2009). A Wavelet Tour of Signal Processing: The Sparse Way. Academic Press, New York. MR2479996[16] Mallat, S. (2009). A Wavelet Tour of Signal Processing: The Sparse Way. Academic Press, New York. MR2479996

17.

[17] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817. MR2485013 1168.62058 10.1214/08-AOS618 euclid.aos/1231165185 [17] Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817. MR2485013 1168.62058 10.1214/08-AOS618 euclid.aos/1231165185

18.

[18] Nadler, B. (2009). Discussion of “On consistency and sparsity for principal components analysis in high dimensions,” by I. M. Johnstone and A. Y. Lu. J. Amer. Statist. Assoc. 104 694–697. MR2751449 10.1198/jasa.2009.0147[18] Nadler, B. (2009). Discussion of “On consistency and sparsity for principal components analysis in high dimensions,” by I. M. Johnstone and A. Y. Lu. J. Amer. Statist. Assoc. 104 694–697. MR2751449 10.1198/jasa.2009.0147

19.

[19] Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly influential factors. J. Econometrics 168 244–258. MR2923766 10.1016/j.jeconom.2012.01.034[19] Onatski, A. (2012). Asymptotics of the principal components estimator of large factor models with weakly influential factors. J. Econometrics 168 244–258. MR2923766 10.1016/j.jeconom.2012.01.034

20.

[20] Paul, D. (2005). Nonparametric estimation of principal components. Ph.D. thesis, Stanford Univ. MR2707156[20] Paul, D. (2005). Nonparametric estimation of principal components. Ph.D. thesis, Stanford Univ. MR2707156

21.

[21] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642. MR2399865 1134.62029[21] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642. MR2399865 1134.62029

22.

[22] Paul, D. and Johnstone, I. M. (2007). Augmented sparse principal component analysis for high dimensional data. Available at arXiv:1202.1242v1.[22] Paul, D. and Johnstone, I. M. (2007). Augmented sparse principal component analysis for high dimensional data. Available at arXiv:1202.1242v1.

23.

[23] Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philos. Mag. Ser. 6 2 559–572.[23] Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philos. Mag. Ser. 6 2 559–572.

24.

[24] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York. MR2168993[24] Ramsay, J. O. and Silverman, B. W. (2005). Functional Data Analysis, 2nd ed. Springer, New York. MR2168993

25.

[25] Reimann, P., Van den Broeck, C. and Bex, G. J. (1996). A Gaussian scenario for unsupervised learning. J. Phys. A 29 3521–3535.[25] Reimann, P., Van den Broeck, C. and Bex, G. J. (1996). A Gaussian scenario for unsupervised learning. J. Phys. A 29 3521–3535.

26.

[26] Shen, D., Shen, H. and Marron, J. S. (2011). Consistency of sparse PCA in high dimension, low sample size contexts.[26] Shen, D., Shen, H. and Marron, J. S. (2011). Consistency of sparse PCA in high dimension, low sample size contexts.

27.

[27] Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015–1034. MR2419336 1141.62049 10.1016/j.jmva.2007.06.007[27] Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015–1034. MR2419336 1141.62049 10.1016/j.jmva.2007.06.007

28.

[28] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA. MR1061154[28] Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA. MR1061154

29.

[29] Tsay, R. S. (2005). Analysis of Financial Time Series, 2nd ed. Wiley, Hoboken, NJ. MR2162112[29] Tsay, R. S. (2005). Analysis of Financial Time Series, 2nd ed. Wiley, Hoboken, NJ. MR2162112

30.

[30] Ulfarsson, M. O. and Solo, V. (2008). Sparse variable PCA using geodesic steepest descent. IEEE Trans. Signal Process. 56 5823–5832. MR2518261 10.1109/TSP.2008.2006587[30] Ulfarsson, M. O. and Solo, V. (2008). Sparse variable PCA using geodesic steepest descent. IEEE Trans. Signal Process. 56 5823–5832. MR2518261 10.1109/TSP.2008.2006587

31.

[31] Varmuza, K. and Filzmoser, P. (2009). Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL.[31] Varmuza, K. and Filzmoser, P. (2009). Introduction to Multivariate Statistical Analysis in Chemometrics. CRC Press, Boca Raton, FL.

32.

[32] Wax, M. and Kailath, T. (1985). Detection of signals by information theoretic criteria. IEEE Trans. Acoust. Speech Signal Process. 33 387–392. MR788604 10.1109/TASSP.1985.1164557[32] Wax, M. and Kailath, T. (1985). Detection of signals by information theoretic criteria. IEEE Trans. Acoust. Speech Signal Process. 33 387–392. MR788604 10.1109/TASSP.1985.1164557

33.

[33] Wedin, P.- Å. (1972). Perturbation bounds in connection with singular value decomposition. Nordisk Tidskr. Informationsbehandling (BIT) 12 99–111. MR309968[33] Wedin, P.- Å. (1972). Perturbation bounds in connection with singular value decomposition. Nordisk Tidskr. Informationsbehandling (BIT) 12 99–111. MR309968

34.

[34] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.[34] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.

35.

[35] Yuan, X. T. and Zhang, T. (2011). Truncated power method for sparse eigenvalue problems. Available at arXiv:1112.2679v1.[35] Yuan, X. T. and Zhang, T. (2011). Truncated power method for sparse eigenvalue problems. Available at arXiv:1112.2679v1.

36.

[36] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286. MR2252527 10.1198/106186006X113430[36] Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286. MR2252527 10.1198/106186006X113430

Citation Download Citation

Zongming Ma "Sparse principal component analysis and iterative thresholding," The Annals of Statistics 41(2), 772-801, (April 2013). https://doi.org/10.1214/13-AOS1097

Published: April 2013

Access the abstract

JOURNAL ARTICLE
30 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY