The Annals of Statistics

Sparse CCA: Adaptive estimation and computational barriers

Chao Gao, Zongming Ma, and Harrison H. Zhou

Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax and computationally tractable estimation of leading sparse canonical coefficient vectors in high dimensions. Under a Gaussian canonical pair model, we first establish separate minimax estimation rates for canonical coefficient vectors of each set of random variables under no structural assumption on marginal covariance matrices. Second, we propose a computationally feasible estimator to attain the optimal rates adaptively under an additional sample size condition. Finally, we show that a sample size condition of this kind is needed for any randomized polynomial-time estimator to be consistent, assuming hardness of certain instances of the planted clique detection problem. As a byproduct, we obtain the first computational lower bounds for sparse PCA under the Gaussian single spiked covariance model.

Ann. Statist. Volume 45, Number 5 (2017), 2074-2101.

Received: August 2015
Revised: September 2016
First available in Project Euclid: 31 October 2017

Primary: 62H12: Estimation
Secondary: 62C20: Minimax procedures

Convex programming group-Lasso minimax rates computational complexity planted clique sparse CCA (SCCA) sparse PCA (SPCA)


Gao, Chao; Ma, Zongming; Zhou, Harrison H. Sparse CCA: Adaptive estimation and computational barriers. Ann. Statist. 45 (2017), no. 5, 2074--2101. doi:10.1214/16-AOS1519.

Supplemental materials

  • Supplement to “Sparse CCA: Adaptive estimation and computational barriers”. The supplement presents additional proofs and technical details, implementation detail of (18), and numerical studies.