High-dimensional analysis of semidefinite relaxations for sparse principal components

Arash A. Amini; Martin J. Wainwright

doi:10.1214/08-AOS664

October 2009 High-dimensional analysis of semidefinite relaxations for sparse principal components

Arash A. Amini, Martin J. Wainwright

Ann. Statist. 37(5B): 2877-2921 (October 2009). DOI: 10.1214/08-AOS664

Abstract

Principal component analysis (PCA) is a classical method for dimensionality reduction based on extracting the dominant eigenvectors of the sample covariance matrix. However, PCA is well known to behave poorly in the “large p, small n” setting, in which the problem dimension p is comparable to or larger than the sample size n. This paper studies PCA in this high-dimensional regime, but under the additional assumption that the maximal eigenvector is sparse, say, with at most k nonzero components. We consider a spiked covariance model in which a base matrix is perturbed by adding a k-sparse maximal eigenvector, and we analyze two computationally tractable methods for recovering the support set of this maximal eigenvector, as follows: (a) a simple diagonal thresholding method, which transitions from success to failure as a function of the rescaled sample size θ_dia(n, p, k)=n/[k²log(p−k)]; and (b) a more sophisticated semidefinite programming (SDP) relaxation, which succeeds once the rescaled sample size θ_sdp(n, p, k)=n/[klog(p−k)] is larger than a critical threshold. In addition, we prove that no method, including the best method which has exponential-time complexity, can succeed in recovering the support if the order parameter θ_sdp(n, p, k) is below a threshold. Our results thus highlight an interesting trade-off between computational and statistical efficiency in high-dimensional inference.

Citation

Download Citation

Arash A. Amini. Martin J. Wainwright. "High-dimensional analysis of semidefinite relaxations for sparse principal components." Ann. Statist. 37 (5B) 2877 - 2921, October 2009. https://doi.org/10.1214/08-AOS664

Information

Published: October 2009

First available in Project Euclid: 17 July 2009

zbMATH: 1173.62049

MathSciNet: MR2541450

Digital Object Identifier: 10.1214/08-AOS664

Subjects:

Primary: 62H25

Secondary: 62F12

Keywords: convex relaxation , High-dimensional statistics , Principal Component Analysis , random matrices , semidefinite programming , Sparsity , spectral analysis , spiked covariance ensembles , Wishart ensembles

Access the abstract

JOURNAL ARTICLE
45 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY