## The Annals of Statistics

### Minimax bounds for sparse PCA with noisy high-dimensional data

#### Abstract

We study the problem of estimating the leading eigenvectors of a high-dimensional population covariance matrix based on independent Gaussian observations. We establish a lower bound on the minimax risk of estimators under the $l_{2}$ loss, in the joint limit as dimension and sample size increase to infinity, under various models of sparsity for the population eigenvectors. The lower bound on the risk points to the existence of different regimes of sparsity of the eigenvectors. We also propose a new method for estimating the eigenvectors by a two-stage coordinate selection scheme.

#### Article information

Source
Ann. Statist., Volume 41, Number 3 (2013), 1055-1084.

Dates
First available in Project Euclid: 13 June 2013

https://projecteuclid.org/euclid.aos/1371150893

Digital Object Identifier
doi:10.1214/12-AOS1014

Mathematical Reviews number (MathSciNet)
MR3113803

Zentralblatt MATH identifier
1292.62071

Subjects
Primary: 62G20: Asymptotic properties
Secondary: 62H25: Factor analysis and principal components; correspondence analysis

#### Citation

Birnbaum, Aharon; Johnstone, Iain M.; Nadler, Boaz; Paul, Debashis. Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 (2013), no. 3, 1055--1084. doi:10.1214/12-AOS1014. https://projecteuclid.org/euclid.aos/1371150893

#### References

• Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
• Anderson, T. W. (1963). Asymptotic theory for principal component analysis. Ann. Math. Statist. 34 122–148.
• Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
• Bickel, P. J. and Levina, E. (2008a). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• Bickel, P. J. and Levina, E. (2008b). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• Birgé, L. (2001). A new look at an old result: Fano’s lemma. Technical report, Univ. Paris 6.
• Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106 672–684.
• Cai, T. T., Ma, Z. and Wu, Y. (2012). Sparse PCA: Optimal rates and adaptive estimation. Technical report. Available at arXiv:1211.1309.
• Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
• Cai, T. T. and Zhou, H. H. (2012). Minimax esrimation of large covariance matrices under $l_{1}$ norm. Statist. Sinica 22 1319–1378.
• d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic).
• Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices and Banach spaces. In Handbook of the Geometry of Banach Spaces, Vol. I (W. B. Johnson and J. Lindenstrauss, eds.) 317–366. North-Holland, Amsterdam.
• El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
• Johnstone, I. M. (2001). Chi-square oracle inequalities. In State of the Art in Probability and Statistics (Leiden, 1999) (M. de Gunst, C. Klaassen and A. van der Waart, eds.). Institute of Mathematical Statistics Lecture Notes—Monograph Series 36 399–418. IMS, Beachwood, OH.
• Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
• Jolliffe, I. T. (2002). Principal Component Analysis. Springer, Berlin.
• Kato, T. (1980). Perturbation Theory of Linear Operators. Springer, New York.
• Kritchman, S. and Nadler, B. (2008). Determining the number of components in a factor model from limited noisy data. Chemometrics and Intelligent Laboratory Systems 94 19–32.
• Lu, A. Y. (2002). Sparse principal components analysis for functional data. Ph.D. thesis, Stanford Univ., Stanford, CA.
• Ma, Z. (2011). Sparse principal component analysis and iterative thresholding. Technical report, Dept. Statistics, The Wharton School, Univ. Pennsylvania, Philadelphia, PA.
• Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley, New York.
• Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817.
• Nadler, B. (2009). Discussion of “On consistency and sparsity for principal component analysis in high dimensions.” J. Amer. Statist. Assoc. 104 694–697.
• Onatski, A. (2006). Determining the number of factors from empirical distribution of eigenvalues. Technical report, Columbia Univ.
• Paul, D. (2005). Nonparametric estimation of principal components. Ph.D. thesis, Stanford Univ. Stanford, CA.
• Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
• Paul, D. and Johnstone, I. M. (2007). Augmented sparse principal component analysis for high dimensional data. Technical report, Univ. California, Davis. Available at arXiv:1202.1242.
• Rothman, A. J., Levina, E. and Zhu, J. (2009). Generalized thresholding of large covariance matrices. J. Amer. Statist. Assoc. 104 177–186.
• Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015–1034.
• Shen, D., Shen, H. and Marron, J. S. (2011). Consistency of sparse PCA in high dimension, low sample size contexts. Technical report. Available at http://arxiv.org/pdf/1104.4289v1.pdf.
• Tipping, M. E. and Bishop, C. M. (1999). Probabilistic principal component analysis. J. R. Stat. Soc. Ser. B Stat. Methodol. 61 611–622.
• van Trees, H. L. (2002). Optimum Array Processing. Wiley, New York.
• Vu, V. Q. and Lei, J. (2012). Minimax rates of estimation for sparse PCA in high dimensions. Technical report. Available at http://arxiv.org/pdf/1202.0786.pdf.
• Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
• Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564–1599.
• Zong, C. (1999). Sphere Packings. Springer, New York.
• Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.