Open Access
June 2015 Do semidefinite relaxations solve sparse PCA up to the information limit?
Robert Krauthgamer, Boaz Nadler, Dan Vilenchik
Ann. Statist. 43(3): 1300-1322 (June 2015). DOI: 10.1214/15-AOS1310

Abstract

Estimating the leading principal components of data, assuming they are sparse, is a central task in modern high-dimensional statistics. Many algorithms were developed for this sparse PCA problem, from simple diagonal thresholding to sophisticated semidefinite programming (SDP) methods. A key theoretical question is under what conditions can such algorithms recover the sparse principal components? We study this question for a single-spike model with an $\ell_{0}$-sparse eigenvector, in the asymptotic regime as dimension $p$ and sample size $n$ both tend to infinity. Amini and Wainwright [ Ann. Statist. 37 (2009) 2877–2921] proved that for sparsity levels $k\geq\Omega(n/\log p)$, no algorithm, efficient or not, can reliably recover the sparse eigenvector. In contrast, for $k\leq O(\sqrt{n/\log p})$, diagonal thresholding is consistent. It was further conjectured that an SDP approach may close this gap between computational and information limits. We prove that when $k\geq\Omega(\sqrt{n})$, the proposed SDP approach, at least in its standard usage, cannot recover the sparse spike. In fact, we conjecture that in the single-spike model, no computationally-efficient algorithm can recover a spike of $\ell_{0}$-sparsity $k\geq\Omega(\sqrt{n})$. Finally, we present empirical results suggesting that up to sparsity levels $k=O(\sqrt{n})$, recovery is possible by a simple covariance thresholding algorithm.

Citation

Download Citation

Robert Krauthgamer. Boaz Nadler. Dan Vilenchik. "Do semidefinite relaxations solve sparse PCA up to the information limit?." Ann. Statist. 43 (3) 1300 - 1322, June 2015. https://doi.org/10.1214/15-AOS1310

Information

Received: 1 September 2014; Revised: 1 January 2015; Published: June 2015
First available in Project Euclid: 15 May 2015

zbMATH: 1320.62138
MathSciNet: MR3346704
Digital Object Identifier: 10.1214/15-AOS1310

Subjects:
Primary: 62H25
Secondary: 62F12

Keywords: convex relaxation , High-dimensional statistics , integrality gap , Principal Component Analysis , random matrices , semidefinite programming , Sparsity , spectral analysis , spiked covariance ensembles , Wishart ensembles

Rights: Copyright © 2015 Institute of Mathematical Statistics

Vol.43 • No. 3 • June 2015
Back to Top