## Bernoulli

• Bernoulli
• Volume 24, Number 4B (2018), 3447-3468.

### A unified matrix model including both CCA and F matrices in multivariate analysis: The largest eigenvalue and its applications

#### Abstract

Let $\mathbf{Z}_{M_{1}\times N}=\mathbf{T}^{\frac{1}{2}}\mathbf{X}$ where $(\mathbf{T}^{\frac{1}{2}})^{2}=\mathbf{T}$ is a positive definite matrix and $\mathbf{X}$ consists of independent random variables with mean zero and variance one. This paper proposes a unified matrix model $\mathbf{\Omega}=(\mathbf{Z}\mathbf{U}_{2}\mathbf{U}_{2}^{T}\mathbf{Z}^{T})^{-1}\mathbf{Z}\mathbf{U}_{1}\mathbf{U}_{1}^{T}\mathbf{Z}^{T},$ where $\mathbf{U}_{1}$ and $\mathbf{U}_{2}$ are isometric with dimensions $N\times N_{1}$ and $N\times(N-N_{2})$ respectively such that $\mathbf{U}_{1}^{T}\mathbf{U}_{1}=\mathbf{I}_{N_{1}}$, $\mathbf{U}_{2}^{T}\mathbf{U}_{2}=\mathbf{I}_{N-N_{2}}$ and $\mathbf{U}_{1}^{T}\mathbf{U}_{2}=0$. Moreover, $\mathbf{U}_{1}$ and $\mathbf{U}_{2}$ (random or non-random) are independent of $\mathbf{Z}_{M_{1}\times N}$ and with probability tending to one, $\operatorname{rank}(\mathbf{U}_{1})=N_{1}$ and $\operatorname{rank}(\mathbf{U}_{2})=N-N_{2}$. We establish the asymptotic Tracy–Widom distribution for its largest eigenvalue under moment assumptions on $\mathbf{X}$ when $N_{1},N_{2}$ and $M_{1}$ are comparable.

The asymptotic distributions of the maximum eigenvalues of the matrices used in Canonical Correlation Analysis (CCA) and of F matrices (including centered and non-centered versions) can be both obtained from that of $\mathbf{\Omega}$ by selecting appropriate matrices $\mathbf{U}_{1}$ and $\mathbf{U}_{2}$. Moreover, via appropriate matrices $\mathbf{U}_{1}$ and $\mathbf{U}_{2}$, this matrix $\mathbf{\Omega}$ can be applied to some multivariate testing problems that cannot be done by both types of matrices. To see this, we explore two more applications. One is in the MANOVA approach for testing the equivalence of several high-dimensional mean vectors, where $\mathbf{U}_{1}$ and $\mathbf{U}_{2}$ are chosen to be two nonrandom matrices. The other one is in the multivariate linear model for testing the unknown parameter matrix, where $\mathbf{U}_{1}$ and $\mathbf{U}_{2}$ are random. For each application, theoretical results are developed and various numerical studies are conducted to investigate the empirical performance.

#### Article information

Source
Bernoulli, Volume 24, Number 4B (2018), 3447-3468.

Dates
Revised: June 2017
First available in Project Euclid: 18 April 2018

Permanent link to this document
https://projecteuclid.org/euclid.bj/1524038759

Digital Object Identifier
doi:10.3150/17-BEJ965

Mathematical Reviews number (MathSciNet)
MR3788178

Zentralblatt MATH identifier
06869881

#### Citation

Han, Xiao; Pan, Guangming; Yang, Qing. A unified matrix model including both CCA and F matrices in multivariate analysis: The largest eigenvalue and its applications. Bernoulli 24 (2018), no. 4B, 3447--3468. doi:10.3150/17-BEJ965. https://projecteuclid.org/euclid.bj/1524038759

#### References

• [1] Anderson, T.W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley Series in Probability and Mathematical Statistics: Probability and Mathematical Statistics. New York: Wiley.
• [2] Bai, Z. and Silverstein, J.W. (2006). Spectral Analysis of Large Dimensional Random Matrices, 1st ed. New York: Springer.
• [3] Bao, Z., Pan, G. and Zhou, W. (2015). Universality for the largest eigenvalue of sample covariance matrices with general population. Ann. Statist. 43 382–421.
• [4] Bao, Z.G., Hu, J., Pan, G.M. and Zhou, W. (2015). Canonical correlation coefficients of high-dimensional normal vectors: Finite rank case. Available at http://arxiv.org/abs/1407.7194.
• [5] El Karoui, N. (2007). Tracy–Widom limit for the largest eigenvalue of a large class of complex sample covariance matrices. Ann. Probab. 35 663–714.
• [6] Erdős, L., Yau, H.-T. and Yin, J. (2012). Rigidity of eigenvalues of generalized Wigner matrices. Adv. Math. 229 1435–1515.
• [7] Fujikoshi, Y., Ulyanov, V.V. and Shimizu, R. (2010). Multivariate Statistics: High-Dimensional and Large-Sample Approximations. Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley.
• [8] Gao, C., Ma, Z., Ren, Z. and Zhou, H.H. (2015). Minimax estimation in sparse canonical correlation analysis. Ann. Statist. 43 2168–2197.
• [9] Han, X., Pan, G. and Yang, Q. (2017). Supplement to “A unified matrix model including both CCA and F matrices in multivariate analysis: The largest eigenvalue and its applications.” DOI:10.3150/17-BEJ965SUPP.
• [10] Han, X., Pan, G. and Zhang, B. (2016). The Tracy–Widom law for the largest eigenvalue of F type matrices. Ann. Statist. 44 1564–1592.
• [11] Hotelling, H. (1936). Relations between two sets of variates. Biometrika 321–377.
• [12] Johnstone, I.M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Ann. Statist. 36 2638–2716.
• [13] Johnstone, I.M. (2009). Approximate null distribution of the largest root in multivariate analysis. Ann. Appl. Stat. 3 1616–1633.
• [14] Knowles, A. and Yin, J. (2017). Anisotropic local laws for random matrices. Probab. Theory Related Fields 169 257–352.
• [15] Tracy, C.A. and Widom, H. (1994). Level-spacing distributions and the Airy kernel. Comm. Math. Phys. 159 151–174.
• [16] Tracy, C.A. and Widom, H. (1996). On orthogonal and symplectic matrix ensembles. Comm. Math. Phys. 177 727–754.
• [17] Yang, Y. and Pan, G. (2015). Independence test for high dimensional data based on regularized canonical correlation coefficients. Ann. Statist. 43 467–500.

#### Supplemental materials

• Supplement to “A unified matrix model including both CCA and F matrices in multivariate analysis: The largest eigenvalue and its applications”. We provide the detailed proof of Theorem 2.1 in the supplementary file.