The Annals of Statistics

Canonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank case

Zhigang Bao, Jiang Hu, Guangming Pan, and Wang Zhou

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Consider a Gaussian vector $\mathbf{z}=(\mathbf{x}',\mathbf{y}')'$, consisting of two sub-vectors $\mathbf{x}$ and $\mathbf{y}$ with dimensions $p$ and $q$, respectively. With $n$ independent observations of $\mathbf{z}$, we study the correlation between $\mathbf{x}$ and $\mathbf{y}$, from the perspective of the canonical correlation analysis. We investigate the high-dimensional case: both $p$ and $q$ are proportional to the sample size $n$. Denote by $\Sigma_{uv}$ the population cross-covariance matrix of random vectors $\mathbf{u}$ and $\mathbf{v}$, and denote by $S_{uv}$ the sample counterpart. The canonical correlation coefficients between $\mathbf{x}$ and $\mathbf{y}$ are known as the square roots of the nonzero eigenvalues of the canonical correlation matrix $\Sigma_{xx}^{-1}\Sigma_{xy}\Sigma_{yy}^{-1}\Sigma_{yx}$. In this paper, we focus on the case that $\Sigma_{xy}$ is of finite rank $k$, that is, there are $k$ nonzero canonical correlation coefficients, whose squares are denoted by $r_{1}\geq\cdots\geq r_{k}>0$. We study the sample counterparts of $r_{i},i=1,\ldots,k$, that is, the largest $k$ eigenvalues of the sample canonical correlation matrix $S_{xx}^{-1}S_{xy}S_{yy}^{-1}S_{yx}$, denoted by $\lambda_{1}\geq\cdots\geq\lambda_{k}$. We show that there exists a threshold $r_{c}\in(0,1)$, such that for each $i\in\{1,\ldots,k\}$, when $r_{i}\leq r_{c}$, $\lambda_{i}$ converges almost surely to the right edge of the limiting spectral distribution of the sample canonical correlation matrix, denoted by $d_{+}$. When $r_{i}>r_{c}$, $\lambda_{i}$ possesses an almost sure limit in $(d_{+},1]$, from which we can recover $r_{i}$’s in turn, thus provide an estimate of the latter in the high-dimensional scenario. We also obtain the limiting distribution of $\lambda_{i}$’s under appropriate normalization. Specifically, $\lambda_{i}$ possesses Gaussian type fluctuation if $r_{i}>r_{c}$, and follows Tracy–Widom distribution if $r_{i}<r_{c}$. Some applications of our results are also discussed.

Article information

Source
Ann. Statist., Volume 47, Number 1 (2019), 612-640.

Dates
Received: June 2017
Revised: March 2018
First available in Project Euclid: 30 November 2018

Permanent link to this document
https://projecteuclid.org/euclid.aos/1543568600

Digital Object Identifier
doi:10.1214/18-AOS1704

Mathematical Reviews number (MathSciNet)
MR3909944

Zentralblatt MATH identifier
07036213

Subjects
Primary: 62H20: Measures of association (correlation, canonical correlation, etc.) 60B20: Random matrices (probabilistic aspects; for algebraic aspects see 15B52) 60F99: None of the above, but in this section

Keywords
Canonical correlation analysis random matrices MANOVA ensemble high-dimensional data finite rank perturbation largest eigenvalues

Citation

Bao, Zhigang; Hu, Jiang; Pan, Guangming; Zhou, Wang. Canonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank case. Ann. Statist. 47 (2019), no. 1, 612--640. doi:10.1214/18-AOS1704. https://projecteuclid.org/euclid.aos/1543568600


Export citation

References

  • [1] Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley Series in Probability and Statistics. Wiley, Hoboken, NJ.
  • [2] Bai, Z., Choi, K. P. and Fujikoshi, Y. (2018). Consistency of AIC and BIC in estimating the number of significant components in high-dimensional principal component analysis. Ann. Statist. 46 1050–1076.
  • [3] Bai, Z., Hu, J., Pan, G. and Zhou, W. (2015). Convergence of the empirical spectral distribution function of Beta matrices. Bernoulli 21 1538–1574.
  • [4] Bai, Z. and Yao, J. (2008). Central limit theorems for eigenvalues in a spiked population model. Ann. Inst. Henri Poincaré Probab. Stat. 44 447–474.
  • [5] Baik, J., Ben Arous, G. and Péché, S. (2005). Phase transition of the largest eigenvalue for nonnull complex sample covariance matrices. Ann. Probab. 33 1643–1697.
  • [6] Baik, J. and Silverstein, J. W. (2006). Eigenvalues of large sample covariance matrices of spiked population models. J. Multivariate Anal. 97 1382–1408.
  • [7] Bao, Z., Hu, J., Pan, G. and Zhou, W. (2017). Test of independence for high-dimensional random vectors based on freeness in block correlation matrices. Electron. J. Stat. 11 1527–1548.
  • [8] Bao, Z. and Hu, J. (2018) High-dimensional CCA with general population. (In preparation).
  • [9] Bao, Z., Hu, J., Pan, G. and Zhou, W. (2019). Supplement to “Canonical correlation coefficients of high-dimensional Gaussian vectors: finite rank case.” DOI:10.1214/18-AOS1704SUPP.
  • [10] Belinschi, S. T., Bercovici, H., Capitaine, M. and Février, M. (2017). Outliers in the spectrum of large deformed unitarily invariant models. Ann. Probab. 45 3571–3625.
  • [11] Benaych-Georges, F., Guionnet, A. and Maida, M. (2011). Fluctuations of the extreme eigenvalues of finite rank deformations of random matrices. Electron. J. Probab. 16 1621–1662.
  • [12] Benaych-Georges, F. and Nadakuditi, R. R. (2011). The eigenvalues and eigenvectors of finite, low rank perturbations of large random matrices. Adv. Math. 227 494–521.
  • [13] Bretherton, C. S., Smith, C. and Wallace, J. M. (1992). An intercomparison of methods for finding coupled patterns in climate data. J. Climate 5 541–560.
  • [14] Capitaine, M., Donati-Martin, C. and Féral, D. (2009). The largest eigenvalues of finite rank deformation of large Wigner matrices: Convergence and nonuniversality of the fluctuations. Ann. Probab. 37 1–47.
  • [15] Capitaine, M., Donati-Martin, C. and Féral, D. (2012). Central limit theorems for eigenvalues of deformations of Wigner matrices. Ann. Inst. Henri Poincaré Probab. Stat. 48 107–133.
  • [16] Davidson, K. R. and Szarek, S. J. (2001). Local operator theory, random matrices and Banach spaces. In Handbook of the Geometry of Banach Spaces, Vol. I 317–366. North-Holland, Amsterdam.
  • [17] Dutilleul, P., Pelletier, B. and Alpargu, G. (2008). Modified $F$ tests for assessing the multiple correlation between one spatial process and several others. J. Statist. Plann. Inference 138 1402–1415.
  • [18] Edelman, A. and Rao, N. R. (2005). Random matrix theory. Acta Numer. 14 233–297.
  • [19] Féral, D. and Péché, S. (2007). The largest eigenvalue of rank one deformation of large Wigner matrices. Comm. Math. Phys. 272 185–228.
  • [20] Féral, D. and Péché, S. (2009). The largest eigenvalues of sample covariance matrices for a spiked population: Diagonal case. J. Math. Phys. 50 073302, 33.
  • [21] Fujikoshi, Y. (2016). High-Dimensional Asymptotic Distributions of Characteristic Roots in multivariate linear models and canonical correlation analysis. Technical report.
  • [22] Fujikoshi, Y. and Sakurai, T. (2016). High-dimensional consistency of rank estimation criteria in multivariate linear model. J. Multivariate Anal. 149 199–212.
  • [23] Gao, C., Ma, Z., Ren, Z. and Zhou, H. H. (2015). Minimax estimation in sparse canonical correlation analysis. Ann. Statist. 43 2168–2197.
  • [24] Gao, C., Ma, Z. and Zhou, H. H. (2017). Sparse CCA: Adaptive estimation and computational barriers. Ann. Statist. 45 2074–2101.
  • [25] Gittins, R. (1985). Canonical Analysis. Biomathematics 12. Springer, Berlin, Heidelberg.
  • [26] Han, X., Pan, G. and Yang, Q. (2018). A unified matrix model including both CCA and F matrices in multivariate analysis: The largest eigenvalue and its applications. Bernoulli 24 3447–3468.
  • [27] Han, X., Pan, G. and Zhang, B. (2016). The Tracy–Widom law for the largest eigenvalue of F type matrices. Ann. Statist. 44 1564–1592.
  • [28] Hotelling, H. (1936). Relations between two sets of variates. Biometrika 28 321–377.
  • [29] Hyodo, M., Shutoh, N., Nishiyama, T. and Pavlenko, T. (2015). Testing block-diagonal covariance structure for high-dimensional data. Stat. Neerl. 69 460–482.
  • [30] Jiang, D., Bai, Z. and Zheng, S. (2013). Testing the independence of sets of large-dimensional variables. Sci. China Math. 56 135–147.
  • [31] Jiang, T. and Yang, F. (2013). Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Ann. Statist. 41 2029–2074.
  • [32] Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
  • [33] Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Ann. Statist. 36 2638–2716.
  • [34] Johnstone, I. M. (2009). Approximate null distribution of the largest root in multivariate analysis. Ann. Appl. Stat. 3 1616–1633.
  • [35] Johnstone, I. M. and Onatski, A. (2015). Testing in high-dimensional spiked models. arXiv:1509.07269.
  • [36] Kargin, V. (2015). Subordination for the sum of two random matrices. Ann. Probab. 43 2119–2150.
  • [37] Katz-Moses, B. (2012). Small Deviations for the Beta–Jacobi Ensemble. ProQuest LLC, Ann Arbor, MI. Thesis (Ph.D.)–University of Colorado at Boulder.
  • [38] Knowles, A. and Yin, J. (2013). The isotropic semicircle law and deformation of Wigner matrices. Comm. Pure Appl. Math. 66 1663–1750.
  • [39] Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley Series in Probability and Mathematical Statistics. Wiley, New York.
  • [40] Oda, R., Yanagihara, H. and Fujikoshi, Y. (2016). Asymptotic non-null distributions of test statistics for redundancy in the high-dimensional canonical correlation analysis Technical report.
  • [41] Passemier, D. and Yao, J.-F. (2012). On determining the number of spikes in a high-dimensional spiked population model. Random Matrices Theory Appl. 1 1150002, 19.
  • [42] Paul, D. (2007). Asymptotics of sample eigenstructure for a large dimensional spiked covariance model. Statist. Sinica 17 1617–1642.
  • [43] Péché, S. (2006). The largest eigenvalue of small rank perturbations of Hermitian random matrices. Probab. Theory Related Fields 134 127–173.
  • [44] Rencher, A. C. and Pun, F. C. (1980). Inflation of $R^{2}$ in best subset regression. Technometrics 22 49.
  • [45] Wachter, K. W. (1980). The limiting empirical measure of multiple discriminant ratios. Ann. Statist. 8 937–957.
  • [46] Wang, Q. and Yao, J. (2017). Extreme eigenvalues of large-dimensional spiked Fisher matrices with application. Ann. Statist. 45 415–460.
  • [47] Yamada, Y., Hyodo, M. and Nishiyama, T. (2017). Testing block-diagonal covariance structure for high-dimensional data under non-normality. J. Multivariate Anal. 155 305–316.
  • [48] Yang, Y. and Pan, G. (2012). The convergence of the empirical distribution of canonical correlation coefficients. Electron. J. Probab. 17 no. 64, 13.
  • [49] Yang, Y. and Pan, G. (2015). Independence test for high dimensional data based on regularized canonical correlation coefficients. Ann. Statist. 43 467–500.
  • [50] Zheng, S., Jiang, D., Bai, Z. and He, X. (2014). Inference on multiple correlation coefficients with moderately high dimensional data. Biometrika 101 748–754.

Supplemental materials

  • Supplement to “Canonical correlation coefficients of high-dimensional Gaussian vectors: Finite rank case”. In this supplementary material, we present some simulation results and prove Theorem 2.1 and 2.3, Lemmas 6.1–6.3, 7.3–7.4, and also Proposition 7.1.