## The Annals of Statistics

### Sparsistency and agnostic inference in sparse PCA

#### Abstract

The presence of a sparse “truth” has been a constant assumption in the theoretical analysis of sparse PCA and is often implicit in its methodological development. This naturally raises questions about the properties of sparse PCA methods and how they depend on the assumption of sparsity. Under what conditions can the relevant variables be selected consistently if the truth is assumed to be sparse? What can be said about the results of sparse PCA without assuming a sparse and unique truth? We answer these questions by investigating the properties of the recently proposed Fantope projection and selection (FPS) method in the high-dimensional setting. Our results provide general sufficient conditions for sparsistency of the FPS estimator. These conditions are weak and can hold in situations where other estimators are known to fail. On the other hand, without assuming sparsity or identifiability, we show that FPS provides a sparse, linear dimension-reducing transformation that is close to the best possible in terms of maximizing the predictive covariance.

#### Article information

Source
Ann. Statist. Volume 43, Number 1 (2015), 299-322.

Dates
First available in Project Euclid: 6 February 2015

https://projecteuclid.org/euclid.aos/1423230081

Digital Object Identifier
doi:10.1214/14-AOS1273

Mathematical Reviews number (MathSciNet)
MR3311861

Zentralblatt MATH identifier
1308.62125

Subjects
Primary: 62H12: Estimation

#### Citation

Lei, Jing; Vu, Vincent Q. Sparsistency and agnostic inference in sparse PCA. Ann. Statist. 43 (2015), no. 1, 299--322. doi:10.1214/14-AOS1273. https://projecteuclid.org/euclid.aos/1423230081

#### References

• Akaike, H. (1973). Information theory and an extension of the likelihood principle. In Proceedings of the Second International Symposium of Information Theory. Akadémiai Kiado, Budapest.
• Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
• Berk, R. H. (1966). Limiting behavior of posterior distributions when the model is incorrect. Ann. Math. Statist. 37 51–58; Correction, Ibid 37 745–746.
• Berthet, Q. and Rigollet, P. (2013a). Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 1780–1815.
• Berthet, Q. and Rigollet, P. (2013b). Computational lower bounds for sparse PCA. Preprint. Available at arXiv:1304.0828.
• Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
• Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2010). Distributed optimization and statistical learning via the alternating direction method of multipliers. Faund. Trends Mach. Learn. 3 1–122.
• Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer, Heidelberg.
• Buja, A., Hastie, T. and Tibshirani, R. (1989). Linear smoothers and additive models. Ann. Statist. 17 453–555.
• Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
• d’Aspremont, A., Bach, F. and El Ghaoui, L. (2008). Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9 1269–1294.
• d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic).
• Deshpande, Y. and Montanari, A. (2013). Finding hidden cliques of size $\sqrt{N/e}$ in nearly linear time. Preprint. Available at arXiv:1304.7047.
• Fan, J. and Li, R. (2001). Variable selection via nonconcave penalized likelihood and its oracle properties. J. Amer. Statist. Assoc. 96 1348–1360.
• Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
• Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
• Hotelling, H. (1933). Analysis of a complex of statistical variables into principal components. J. Educ. Psychol. 498–520.
• Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66), Vol. I: Statistics 221–233. Univ. California Press, Berkeley.
• Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
• Jolliffe, I. T., Trendafilov, N. T. and Uddin, M. (2003). A modified principal component technique based on the LASSO. J. Comput. Graph. Statist. 12 531–547.
• Journée, M., Nesterov, Y., Richtárik, P. and Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11 517–553.
• Kearns, M. J., Schapire, R. E. and Sellie, L. M. (1994). Toward efficient agnostic learning. Mach. Learn. 17 115–141.
• Krauthgamer, R., Nadler, B. and Vilenchik, D. (2013). Do semidefinite relaxations solve sparse PCA up to the information limit? Preprint. Available at ArXiv:1306.3690.
• Lam, C. and Fan, J. (2009). Sparsistency and rates of convergence in large covariance matrix estimation. Ann. Statist. 37 4254–4278.
• Lounici, K. (2013). Sparse principal component analysis with missing observations. Progr. Probab. 66 327–356.
• Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772–801.
• Mackey, L. W. (2009). Deflation methods for sparse PCA. In Advances in Neural Information Processing Systems 21 (D. Koller, D. Schuurmans, Y. Bengio and L. Bottou, eds.) 1017–1024. Curran Associates, Red Hook, NY.
• Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
• Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
• Overton, M. L. and Womersley, R. S. (1992). On the sum of the largest eigenvalues of a symmetric matrix. SIAM J. Matrix Anal. Appl. 13 41–45.
• Paul, D. and Johnstone, I. M. (2012). Augmented sparse principal component analysis for high dimensional data. Preprint. Available at arXiv:1202.1242.
• Pearson, K. (1901). On lines and planes of closest fit to systems of points in space. Philos. Mag. 2 559–572.
• Ravikumar, P., Wainwright, M. J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_1$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
• Rothman, A. J., Bickel, P. J., Levina, E. and Zhu, J. (2008). Sparse permutation invariant covariance estimation. Electron. J. Stat. 2 494–515.
• Shen, H. and Huang, J. Z. (2008). Sparse principal component analysis via regularized low rank matrix approximation. J. Multivariate Anal. 99 1015–1034.
• van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
• Vu, V. Q. and Lei, J. (2012). Minimax rates of estimation for sparse PCA in high dimensions. In Proc. Fifteenth International Conference on Artificial Intelligence and Statistics JMLR W&CP 22 1278–1286.
• Vu, V. Q. and Lei, J. (2013). Minimax sparse principal subspace estimation in high dimensions. Ann. Statist. 41 2905–2947.
• Vu, V. Q., Cho, J., Lei, J. and Rohe, K. (2013). Fantope projection and selection: A near-optimal convex relaxation of sparse PCA. In Advances in Neural Information Processing Systems (NIPS) 26 (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 2670–2678. Curran Associates, Red Hook, NY.
• Wainwright, M. J. (2009). Sharp thresholds for high-dimensional and noisy sparsity recovery using $\ell_1$-constrained quadratic programming (Lasso). IEEE Trans. Inform. Theory 55 2183–2202.
• White, H. (1982). Maximum likelihood estimation of misspecified models. Econometrica 50 1–25.
• Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
• Yuan, X.-T. and Zhang, T. (2013). Truncated power method for sparse eigenvalue problems. J. Mach. Learn. Res. 14 899–925.
• Zhao, P. and Yu, B. (2006). On model selection consistency of Lasso. J. Mach. Learn. Res. 7 2541–2563.
• Zou, H., Hastie, T. and Tibshirani, R. (2006). Sparse principal component analysis. J. Comput. Graph. Statist. 15 265–286.