## The Annals of Statistics

### Sparse CCA: Adaptive estimation and computational barriers

#### Abstract

Canonical correlation analysis is a classical technique for exploring the relationship between two sets of variables. It has important applications in analyzing high dimensional datasets originated from genomics, imaging and other fields. This paper considers adaptive minimax and computationally tractable estimation of leading sparse canonical coefficient vectors in high dimensions. Under a Gaussian canonical pair model, we first establish separate minimax estimation rates for canonical coefficient vectors of each set of random variables under no structural assumption on marginal covariance matrices. Second, we propose a computationally feasible estimator to attain the optimal rates adaptively under an additional sample size condition. Finally, we show that a sample size condition of this kind is needed for any randomized polynomial-time estimator to be consistent, assuming hardness of certain instances of the planted clique detection problem. As a byproduct, we obtain the first computational lower bounds for sparse PCA under the Gaussian single spiked covariance model.

#### Article information

Source
Ann. Statist. Volume 45, Number 5 (2017), 2074-2101.

Dates
Revised: September 2016
First available in Project Euclid: 31 October 2017

https://projecteuclid.org/euclid.aos/1509436828

Digital Object Identifier
doi:10.1214/16-AOS1519

Subjects
Primary: 62H12: Estimation
Secondary: 62C20: Minimax procedures

#### Citation

Gao, Chao; Ma, Zongming; Zhou, Harrison H. Sparse CCA: Adaptive estimation and computational barriers. Ann. Statist. 45 (2017), no. 5, 2074--2101. doi:10.1214/16-AOS1519. https://projecteuclid.org/euclid.aos/1509436828

#### References

• [1] Anderson, T. W. (1958). An Introduction to Multivariate Statistical Analysis. Wiley, New York.
• [2] Arora, S. and Barak, B. (2009). Computational Complexity: A Modern Approach. Cambridge Univ. Press, Cambridge.
• [3] Avants, B. B., Cook, P. A., Ungar, L., Gee, J. C. and Grossman, M. (2010). Dementia induces correlated reductions in white matter integrity and cortical thickness: A multivariate neuroimaging study with sparse canonical correlation analysis. NeuroImage 50 1004–1016.
• [4] Bao, Z., Hu, J., Pan, G. and Zhou, W. (2014). Canonical correlation coefficients of high-dimensional normal vectors: Finite rank case. Preprint. Available at arXiv:1407.7194.
• [5] Belloni, A., Chernozhukov, V. and Wang, L. (2011). Square-root lasso: Pivotal recovery of sparse signals via conic programming. Biometrika 98 791–806.
• [6] Berthet, Q. and Rigollet, P. (2013). Complexity theoretic lower bounds for sparse principal component detection. In Conference on Learning Theory 1046–1066.
• [7] Bickel, P. J., Ritov, Y. and Tsybakov, A. B. (2009). Simultaneous analysis of lasso and Dantzig selector. Ann. Statist. 37 1705–1732.
• [8] Birnbaum, A., Johnstone, I. M., Nadler, B. and Paul, D. (2013). Minimax bounds for sparse PCA with noisy high-dimensional data. Ann. Statist. 41 1055–1084.
• [9] Blum, L., Cucker, F., Shub, M. and Smale, S. (2012). Complexity and Real Computation. Springer, Berlin.
• [10] Boyd, S., Parikh, N., Chu, E., Peleato, B. and Eckstein, J. (2011). Distributed optimization and statistical learning via the alternating direction method of multipliers. Faund. Trends Mach. Learn. 3 1–122.
• [11] Bunea, F., Lederer, J. and She, Y. (2014). The group square-root lasso: Theoretical properties and fast algorithms. IEEE Trans. Inform. Theory 60 1313–1325.
• [12] Cai, T. T., Ma, Z. and Wu, Y. (2013). Sparse PCA: Optimal rates and adaptive estimation. Ann. Statist. 41 3074–3110.
• [13] Chen, M., Gao, C., Ren, Z. and Zhou, H. H. (2013). Sparse CCA via precision adjusted iterative thresholding. Preprint. Available at arXiv:1311.6186.
• [14] Douglas, J. Jr. and Rachford, H. H. Jr. (1956). On the numerical solution of heat conduction problems in two and three space variables. Trans. Amer. Math. Soc. 82 421–439.
• [15] Feldman, V., Grigorescu, E., Reyzin, L., Vempala, S. S. and Xiao, Y. (2013). Statistical algorithms and a lower bound for detecting planted cliques. In STOC’13—Proceedings of the 2013 ACM Symposium on Theory of Computing 655–664. ACM, New York.
• [16] Gao, C., Ma, Z., Ren, Z. and Zhou, H. H. (2015). Minimax estimation in sparse canonical correlation analysis. Ann. Statist. 43 2168–2197.
• [17] Gao, C., Ma, Z. and Zhou, H. H. (2017). Supplement to “Sparse CCA: Adaptive Estimation and Computational Barriers.” DOI:10.1214/16-AOS1519SUPP.
• [18] Hajek, B., Wu, Y. and Xu, J. (2014). Computational lower bounds for community detection on random graphs. Preprint. Available at arXiv:1406.6625.
• [19] Hardoon, D. R. and Shawe-Taylor, J. (2011). Sparse canonical correlation analysis. Mach. Learn. 83 331–353.
• [20] Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28 321–377.
• [21] Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Ann. Statist. 36 2638–2716.
• [22] Johnstone, I. M. and Lu, A. Y. (2009). On consistency and sparsity for principal components analysis in high dimensions. J. Amer. Statist. Assoc. 104 682–693.
• [23] Lê Cao, K.-A., Martin, P. G. P., Robert-Granié, C. and Besse, P. (2009). Sparse canonical methods for biological data integration: Application to a cross-platform study. BMC Bioinformatics 10 34.
• [24] Lounici, K., Pontil, M., van de Geer, S. and Tsybakov, A. B. (2011). Oracle inequalities and optimal inference under group sparsity. Ann. Statist. 39 2164–2204.
• [25] Ma, Z. (2013). Sparse principal component analysis and iterative thresholding. Ann. Statist. 41 772–801.
• [26] Ma, Z. and Wu, Y. (2013). Computational barriers in minimax submatrix detection. Preprint. Available at arXiv:1309.5914.
• [27] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, London.
• [28] Negahban, S. N., Ravikumar, P., Wainwright, M. J. and Yu, B. (2012). A unified framework for high-dimensional analysis of $M$-estimators with decomposable regularizers. Statist. Sci. 27 538–557.
• [29] Parkhomenko, E., Tritchler, D. and Beyene, J. (2009). Sparse canonical correlation analysis with application to genomic data integration. Stat. Appl. Genet. Mol. Biol. 8 Art. 1, 36.
• [30] Rossman, B. (2010). Average-case complexity of detecting cliques. Ph.D. thesis, Massachusetts Institute of Technology, Cambridge, MA.
• [31] Sun, T. and Zhang, C.-H. (2012). Scaled sparse linear regression. Biometrika 99 879–898.
• [32] Vu, V. Q., Cho, J., Lei, J. and Rohe, K. (2013). Fantope projection and selection: A near-optimal convex relaxation of sparse pca. In Advances in Neural Information Processing Systems 2670–2678.
• [33] Waaijenborg, S. and Zwinderman, A. H. (2009). Sparse canonical correlation analysis for identifying, connecting and completing gene-expression networks. BMC Bioinformatics 10 315.
• [34] Wang, T., Berthet, Q. and Samworth, R. J. (2014). Statistical and computational trade-offs in estimation of sparse principal components. Preprint. Available at arXiv:1408.5369.
• [35] Watson, G. A. (1993). On matrix approximation problems with Ky Fan $k$ norms. Numer. Algorithms 5 263–272.
• [36] Wiesel, A., Kliger, M. and Hero, A. O., III (2008). A greedy approach to sparse canonical correlation analysis. Preprint. Available at arXiv:0801.2748.
• [37] Witten, D. M., Tibshirani, R. and Hastie, T. (2009). A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. Biostatistics 10 515–534.
• [38] Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 49–67.
• [39] Zhang, Y., Wainwright, M. J. and Jordan, M. I. (2014). Lower bounds on the performance of polynomial-time algorithms for sparse linear regression. Preprint. Available at arxiv:1402.1918.

#### Supplemental materials

• Supplement to “Sparse CCA: Adaptive estimation and computational barriers”. The supplement presents additional proofs and technical details, implementation detail of (18), and numerical studies.