The Annals of Statistics

Minimax sparse principal subspace estimation in high dimensions

Vincent Q. Vu and Jing Lei

Full-text: Open access


We study sparse principal components analysis in high dimensions, where $p$ (the number of variables) can be much larger than $n$ (the number of observations), and analyze the problem of estimating the subspace spanned by the principal eigenvectors of the population covariance matrix. We introduce two complementary notions of $\ell_{q}$ subspace sparsity: row sparsity and column sparsity. We prove nonasymptotic lower and upper bounds on the minimax subspace estimation error for $0\leq q\leq1$. The bounds are optimal for row sparse subspaces and nearly optimal for column sparse subspaces, they apply to general classes of covariance matrices, and they show that $\ell_{q}$ constrained estimates can achieve optimal minimax rates without restrictive spiked covariance conditions. Interestingly, the form of the rates matches known results for sparse regression when the effective noise variance is defined appropriately. Our proof employs a novel variational $\sin\Theta$ theorem that may be useful in other regularized spectral estimation problems.

Article information

Ann. Statist. Volume 41, Number 6 (2013), 2905-2947.

First available in Project Euclid: 1 January 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H25: Factor analysis and principal components; correspondence analysis
Secondary: 62H12: Estimation 62C20: Minimax procedures

Principal components analysis subspace estimation sparsity high-dimensional statistics minimax bounds random matrices


Vu, Vincent Q.; Lei, Jing. Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 (2013), no. 6, 2905--2947. doi:10.1214/13-AOS1151.

Export citation


  • Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
  • Bhatia, R. (1997). Matrix Analysis. Graduate Texts in Mathematics 169. Springer, New York.
  • Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
  • Chen, X., Zou, C. and Cook, R. D. (2010). Coordinate-independent sparse sufficient dimension reduction and variable selection. Ann. Statist. 38 3696–3723.
  • Chikuse, Y. (2003). Statistics on Special Manifolds. Lecture Notes in Statistics 174. Springer, New York.
  • d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic).
  • Donoho, D. L. and Johnstone, I. M. (1994). Minimax risk over $l_{p}$-balls for $l_{q}$-error. Probab. Theory Related Fields 99 277–303.
  • Edelman, A., Arias, T. A. and Smith, S. T. (1999). The geometry of algorithms with orthogonality constraints. SIAM J. Matrix Anal. Appl. 20 303–353.
  • Gilbert, E. N. (1952). A comparison of signalling alphabets. Bell System Technical Journal 31 504–522.
  • Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. Journal of Educational Psychology 24 498–520.
  • Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
  • Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547.
  • Journée, M., Nesterov, Y., Richtárik, P. and Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11 517–553.
  • Ledoux, M. (2001). The Concentration of Measure Phenomenon. Mathematical Surveys and Monographs 89. Amer. Math. Soc., Providence, RI.
  • Lounici, K. (2013). Sparse principal component analysis with missing observations. In High Dimensional Probability VI (C. Houdré, D. M. Mason, J. Rosiński and J. A. Wellner, eds.). Progress in Probability 66 327–356. Springer, Basel.
  • Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772–801.
  • Massart, P. (2007). Concentration Inequalities and Model Selection. Lecture Notes in Math. 1896. Springer, Berlin.
  • Mendelson, S. (2010). Empirical processes with a bounded $\psi_{1}$ diameter. Geom. Funct. Anal. 20 988–1027.
  • Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817.
  • Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
  • Pajor, A. (1998). Metric entropy of the Grassmann manifold. In Convex Geometric Analysis (Berkeley, CA, 1996). Mathematical Sciences Research Institute Publications 34 181–188. Cambridge Univ. Press, Cambridge.
  • Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philosophical Magazine 2 559–572.
  • Raskutti, G., Wainwright, M. J. and Yu, B. (2011). Minimax rates of estimation for high-dimensional linear regression over $\ell_{q}$-balls. IEEE Trans. Inform. Theory 57 6976–6994.
  • Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015–1034.
  • Shen, D., Shen, H. and Marron, J. S. (2013). Consistency of sparse PCA in high dimension, low sample size contexts. J. Multivariate Anal. 115 317–333.
  • Stewart, G. W. and Sun, J. G. (1990). Matrix Perturbation Theory. Academic Press, Boston, MA.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B Stat. Methodol. 58 267–288.
  • van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • Varšamov, R. R. (1957). The evaluation of signals in codes with correction of errors. Dokl. Akad. Nauk SSSR (N.S) 117 739–741.
  • Vu, V. Q. and Lei, J. (2012a). Minimax rates of estimation for sparse PCA in high dimensions. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics (AISTATS). JMLR Workshop and Conference Proceedings Volume 22.
  • Vu, V. Q. and Lei, J. (2012b). Squared-norm empirical process in Banach space. Available at arXiv:1312.1005.
  • Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
  • Yu, B. (1997). Assouad, Fano, and Le Cam. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G. L. Yang, eds.) 423–435. Springer, New York.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • Zhao, P., Rocha, G. and Yu, B. (2009). The composite absolute penalties family for grouped and hierarchical variable selection. Ann. Statist. 37 3468–3497.
  • Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.