The Annals of Statistics

Optimal detection of sparse principal components in high dimension

Quentin Berthet and Philippe Rigollet

Full-text: Open access

Abstract

We perform a finite sample analysis of the detection levels for sparse principal components of a high-dimensional covariance matrix. Our minimax optimal test is based on a sparse eigenvalue statistic. Alas, computing this test is known to be NP-complete in general, and we describe a computationally efficient alternative test using convex relaxations. Our relaxation is also proved to detect sparse principal components at near optimal detection levels, and it performs well on simulated datasets. Moreover, using polynomial time reductions from theoretical computer science, we bring significant evidence that our results cannot be improved, thus revealing an inherent trade off between statistical and computational performance.

Article information

Source
Ann. Statist., Volume 41, Number 4 (2013), 1780-1815.

Dates
First available in Project Euclid: 5 September 2013

Permanent link to this document
https://projecteuclid.org/euclid.aos/1378386239

Digital Object Identifier
doi:10.1214/13-AOS1127

Mathematical Reviews number (MathSciNet)
MR3127849

Zentralblatt MATH identifier
1277.62155

Subjects
Primary: 62H25: Factor analysis and principal components; correspondence analysis
Secondary: 62F04 90C22: Semidefinite programming

Keywords
High-dimensional detection sparse principal component analysis spiked covariance model semidefinite relaxation minimax lower bounds planted clique

Citation

Berthet, Quentin; Rigollet, Philippe. Optimal detection of sparse principal components in high dimension. Ann. Statist. 41 (2013), no. 4, 1780--1815. doi:10.1214/13-AOS1127. https://projecteuclid.org/euclid.aos/1378386239


Export citation

References

  • Addario-Berry, L., Broutin, N., Devroye, L. and Lugosi, G. (2010). On combinatorial testing problems. Ann. Statist. 38 3063–3092.
  • Alon, N., Krivelevich, M. and Sudakov, B. (1998). Finding a large hidden clique in a random graph. Random Structures Algorithms 13 457–466.
  • Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96 6745–6750.
  • Alon, N., Andoni, A., Kaufman, T., Matulef, K., Rubinfeld, R. and Xie, N. (2007). Testing $k$-wise and almost $k$-wise independence. In STOC’07—Proceedings of the 39th Annual ACM Symposium on Theory of Computing 496–505. ACM, New York.
  • Alon, N., Arora, S., Manokaran, R., Moshkovitz, D. and Weinstein, O. (2011). On the inapproximability of the densest $\kappa$-subgraph problem. Unpublished manuscript.
  • Ames, B. P. W. and Vavasis, S. A. (2011). Nuclear norm minimization for the planted clique and biclique problems. Math. Program. 129 69–89.
  • Amini, A. A. and Wainwright, M. J. (2009). High-dimensional analysis of semidefinite relaxations for sparse principal components. Ann. Statist. 37 2877–2921.
  • Arias-Castro, E., Bubeck, S. and Lugosi, G. (2012). Detection of correlations. Ann. Statist. 40 412–435.
  • Arias-Castro, E., Candès, E. J. and Durand, A. (2011). Detection of an anomalous cluster in a network. Ann. Statist. 39 278–304.
  • Arias-Castro, E., Candès, E. J. and Plan, Y. (2011). Global testing under sparse alternatives: ANOVA, multiple comparisons and the higher criticism. Ann. Statist. 39 2533–2556.
  • Bach, F., Ahipasaoglu, S. D. and d’Aspremont, A. (2010). Convex relaxations for subset selection. Preprint. Available at arXiv:1006.3601v1.
  • Bai, Z. D. (1999). Methodologies in spectral analysis of large-dimensional random matrices, a review. Statist. Sinica 9 611–677.
  • Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
  • Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
  • Bandeira, A. S., Dobriban, E., Mixon, D. G. and Sawin, W. F. (2012). Certifying the restricted isometry property is hard. Preprint. Available at arXiv:1204.1580.
  • Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
  • Bickel, P. J. and Levina, E. (2008). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
  • Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
  • Boyd, S. and Vandenberghe, L. (2004). Convex Optimization. Cambridge Univ. Press, Cambridge.
  • Brubaker, S. C. and Vempala, S. S. (2009). Random tensors and planted cliques. In Approximation, Randomization, and Combinatorial Optimization. Lecture Notes in Computer Science 5687 406–419. Springer, Berlin.
  • Butucea, C. and Ingster, Y. I. (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli. To appear. Available at arXiv:1109.0898v1.
  • Cai, T. T., Ma, Z. and Wu, Y. (2012). Sparse PCA: Optimal rates and adaptive estimation. Preprint. Available at arXiv:1211.1309.
  • Cai, T. T., Zhang, C.-H. and Zhou, H. H. (2010). Optimal rates of convergence for covariance matrix estimation. Ann. Statist. 38 2118–2144.
  • Chandrasekaran, V. and Jordan, M. I. (2013). Computational and statistical tradeoffs via convex relaxation. Proc. Natl. Acad. Sci. 110 E1181–E1190.
  • Chen, X. (2011). Adaptive elastic-net sparse principal component analysis for pathway association testing. Stat. Appl. Genet. Mol. Biol. 10 Art. 48, 23.
  • d’Aspremont, A., Bach, F. and El Ghaoui, L. (2008). Optimal solutions for sparse principal component analysis. J. Mach. Learn. Res. 9 1269–1294.
  • d’Aspremont, A., Bach, F. and Ghaoui, L. E. (2012). Approximation bounds for sparse principal component analysis. Preprint. Available at arXiv:1205.0121.
  • d’Aspremont, A., El Ghaoui, L., Jordan, M. I. and Lanckriet, G. R. G. (2007). A direct formulation for sparse PCA using semidefinite programming. SIAM Rev. 49 434–448 (electronic).
  • Dekel, Y., Gurel-Gurevich, O. and Peres, Y. (2011). Finding hidden cliques in linear time with high probability. In ANALCO11—Workshop on Analytic Algorithmics and Combinatorics 67–75. SIAM, Philadelphia, PA.
  • Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
  • El Karoui, N. (2008). Operator norm consistent estimation of large-dimensional sparse covariance matrices. Ann. Statist. 36 2717–2756.
  • Feige, U. and Krauthgamer, R. (2000). Finding and certifying a large hidden clique in a semirandom graph. Random Structures Algorithms 16 195–208.
  • Feige, U. and Krauthgamer, R. (2003). The probable value of the Lovász–Schrijver relaxations for maximum independent set. SIAM J. Comput. 32 345–370 (electronic).
  • Feige, U. and Ron, D. (2010). Finding hidden cliques in linear time. In 21st International Meeting on Probabilistic, Combinatorial, and Asymptotic Methods in the Analysis of Algorithms (AofA’10) 189–203. Assoc. Discrete Math. Theor. Comput. Sci., Nancy.
  • Feldman, V., Grigorescu, E., Reyzin, L., Vempala, S. and Xiao, Y. (2013). Statistical algorithms and a lower bound for planted clique. In Proceedings of the 45th Annual ACM Symposium on Theory of Computing, STOC’13 655–664. ACM, New York.
  • Féral, D. and Péché, S. (2009). The largest eigenvalues of sample covariance matrices for a spiked population: Diagonal case. J. Math. Phys. 50 073302, 33.
  • Frieze, A. and Kannan, R. (2008). A new approach to the planted clique problem. In FSTTCS 2008: IARCS Annual Conference on Foundations of Software Technology and Theoretical Computer Science. LIPIcs. Leibniz Int. Proc. Inform. 2 187–198. Schloss Dagstuhl. Leibniz-Zent. Inform., Wadern.
  • Geman, S. (1980). A limit theorem for the norm of random matrices. Ann. Probab. 8 252–261.
  • Goemans, M. X. and Williamson, D. P. (1995). Improved approximation algorithms for maximum cut and satisfiability problems using semidefinite programming. J. Assoc. Comput. Mach. 42 1115–1145.
  • Hazan, E. and Krauthgamer, R. (2011). How hard is it to approximate the best Nash equilibrium? SIAM J. Comput. 40 79–91.
  • Ingster, Y. I. (1982). The asymptotic efficiency of tests for a simple hypothesis against a composite alternative. Teor. Veroyatn. Primen. 27 587–592.
  • Ingster, Y. I., Tsybakov, A. B. and Verzelen, N. (2010). Detection boundary in sparse regression. Electron. J. Stat. 4 1476–1526.
  • Jenatton, R., Obozinski, G. and Bach, F. (2010). Structured sparse principal component analysis. In Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS).
  • Jerrum, M. (1992). Large cliques elude the Metropolis process. Random Structures Algorithms 3 347–359.
  • Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
  • Journée, M., Nesterov, Y., Richtárik, P. and Sepulchre, R. (2010). Generalized power method for sparse principal component analysis. J. Mach. Learn. Res. 11 517–553.
  • Juels, A. and Peinado, M. (2000). Hiding cliques for cryptographic security. Des. Codes Cryptogr. 20 269–280.
  • Karp, R. M. (1972). Reducibility among combinatorial problems. In Complexity of Computer Computations (Proc. Sympos., IBM Thomas J. Watson Res. Center, Yorktown Heights, N.Y., 1972) 85–103. Plenum, New York.
  • Koiran, P. and Zouzias, A. (2012). Hidden cliques and the certification of the restricted isometry property. Preprint. Available at arXiv:1211.0665.
  • Krivelevich, M. and Vu, V. H. (2002). Approximating the independence number and the chromatic number in expected polynomial time. J. Comb. Optim. 6 143–155.
  • Kučera, L. (1995). Expected complexity of graph partitioning problems. Discrete Appl. Math. 57 193–212.
  • Laurent, B. and Massart, P. (2000). Adaptive estimation of a quadratic functional by model selection. Ann. Statist. 28 1302–1338.
  • Loh, P.-L. and Wainwright, M. J. (2012). High-dimensional regression with noisy and missing data: Provable guarantees with nonconvexity. Ann. Statist. 40 1637–1664.
  • Lu, Z. and Zhang, Y. (2012). An augmented Lagrangian approach for sparse principal component analysis. Math. Program. 135 149–193.
  • Ma, S. (2011). Alternating direction method of multipliers for sparse principal component analysis. Preprint. Available at arXiv:1111.6703v1.
  • Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772–801.
  • Nadler, B. (2008). Finite sample approximation results for principal component analysis: A matrix perturbation approach. Ann. Statist. 36 2791–2817.
  • Nesterov, Y. (2003). Introductory Lectures on Convex Optimization. Springer, New York.
  • Nesterov, Y. and Nemirovskii, A. (1987). Interior-Point Polynomial Algorithms in Convex Programming. Society for Industrial Mathematics 13. SIAM, Philadelphia, PA.
  • Onatski, A., Moreira, M. J. and Hallin, M. (2013). Asymptotic power of sphericity tests for high-dimensional data. Ann. Statist. 41 1204–1231.
  • Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • Paul, D. and Johnstone, I. M. (2012). Augmented sparse principal component analysis for high dimensional data. Preprint. Available at arXiv:1202.1242v1.
  • Rossman, B. (2010). Average-case complexity of detecting cliques. Ph.D. thesis, Massachusetts Institute of Technology.
  • Shen, D., Shen, H. and Marron, J. S. (2013). Consistency of sparse PCA in high dimension, low sample size contexts. J. Multivariate Anal. 115 317–333.
  • Spencer, J. (1994). Ten Lectures on the Probabilistic Method, 2nd ed. CBMS-NSF Regional Conference Series in Applied Mathematics 64. SIAM, Philadelphia, PA.
  • Sun, X. and Nobel, A. B. (2008). On the size and recovery of submatrices of ones in a random binary matrix. J. Mach. Learn. Res. 9 2431–2453.
  • Sun, X. and Nobel, A. B. (2013). On the maximal size of large-average and ANOVA-fit submatrices in a Gaussian random matrix. Bernoulli 19 275–294.
  • Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York. Revised and extended from the 2004 French original, translated by Vladimir Zaiats.
  • Vershynin, R. (2012). Introduction to the non-asymptotic analysis of random matrices. In Compressed Sensing 210–268. Cambridge Univ. Press, Cambridge.
  • Verzelen, N. (2012). Minimax risks for sparse regressions: Ultra-high dimensional phenomenons. Electron. J. Stat. 6 38–90.
  • Vu, V. and Lei, J. (2012). Minimax rates of estimation for sparse pca in high dimensions. In Proceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics April 2123, 2012, La Palma, Canary Islands, Vol. 22 of JMLR W&CP 1278–1286.
  • Wright, J., Ganesh, A., Yang, A., Zhou, Z. and Ma, Y. (2011). Sparsity and robustness in face recognition. Preprint. Available at arXiv:1111.1014.
  • Yin, Y. Q., Bai, Z. D. and Krishnaiah, P. R. (1988). On the limit of the largest eigenvalue of the large-dimensional sample covariance matrix. Probab. Theory Related Fields 78 509–521.