## The Annals of Statistics

### Independence test for high dimensional data based on regularized canonical correlation coefficients

#### Abstract

This paper proposes a new statistic to test independence between two high dimensional random vectors $\mathbf{X}:p_{1}\times1$ and $\mathbf{Y}:p_{2}\times1$. The proposed statistic is based on the sum of regularized sample canonical correlation coefficients of $\mathbf{X}$ and $\mathbf{Y}$. The asymptotic distribution of the statistic under the null hypothesis is established as a corollary of general central limit theorems (CLT) for the linear statistics of classical and regularized sample canonical correlation coefficients when $p_{1}$ and $p_{2}$ are both comparable to the sample size $n$. As applications of the developed independence test, various types of dependent structures, such as factor models, ARCH models and a general uncorrelated but dependent case, etc., are investigated by simulations. As an empirical application, cross-sectional dependence of daily stock returns of companies between different sections in the New York Stock Exchange (NYSE) is detected by the proposed test.

#### Article information

Source
Ann. Statist., Volume 43, Number 2 (2015), 467-500.

Dates
First available in Project Euclid: 24 February 2015

https://projecteuclid.org/euclid.aos/1424787425

Digital Object Identifier
doi:10.1214/14-AOS1284

Mathematical Reviews number (MathSciNet)
MR3316187

Zentralblatt MATH identifier
1344.60027

#### Citation

Yang, Yanrong; Pan, Guangming. Independence test for high dimensional data based on regularized canonical correlation coefficients. Ann. Statist. 43 (2015), no. 2, 467--500. doi:10.1214/14-AOS1284. https://projecteuclid.org/euclid.aos/1424787425

#### References

• [1] Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley, New York.
• [2] Bai, J. and Ng, S. (2002). Determining the number of factors in approximate factor models. Econometrica 70 191–221.
• [3] Bai, Z., Chen, J. and Yao, J. (2010). On estimation of the population spectral distribution from a high-dimensional sample covariance matrix. Aust. N. Z. J. Stat. 52 423–437.
• [4] Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329.
• [5] Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer, New York.
• [6] Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [7] Birke, M. and Dette, H. (2005). A note on testing the covariance matrix for large dimension. Statist. Probab. Lett. 74 281–289.
• [8] El Karoui, N. (2008). Spectrum estimation for large dimensional covariance matrices using random matrix theory. Ann. Statist. 36 2757–2790.
• [9] Fan, J. and Fan, Y. (2008). High-dimensional classification using features annealed independence rules. Ann. Statist. 36 2605–2637.
• [10] Fan, J., Guo, S. and Hao, N. (2012). Variance estimation using refitted cross-validation in ultrahigh dimensional regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 74 37–65.
• [11] Fujikoshi, Y., Ulyanov, V. V. and Shimizu, R. (2010). Multivariate Statistics: High-Dimensional and Large-Sample Approximations. Wiley, Hoboken, NJ.
• [12] Huang, J., Horowitz, J. L. and Ma, S. (2008). Asymptotic properties of bridge estimators in sparse high-dimensional regression models. Ann. Statist. 36 587–613.
• [13] Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Ann. Statist. 36 2638–2716.
• [14] Lytova, A. and Pastur, L. (2009). Central limit theorem for linear eigenvalue statistics of random matrices with independent entries. Ann. Probab. 37 1778–1840.
• [15] Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, San Diego.
• [16] Pan, G. (2010). Strong convergence of the empirical distribution of eigenvalues of sample covariance matrices with a perturbation matrix. J. Multivariate Anal. 101 1330–1338.
• [17] Silverstein, J. W. and Bai, Z. D. (1995). On the empirical distribution of eigenvalues of a class of large-dimensional random matrices. J. Multivariate Anal. 54 175–192.
• [18] Timm, N. H. (2002). Applied Multivariate Analysis. Springer, New York.
• [19] Wachter, K. W. (1980). The limiting empirical measure of multiple discriminant ratios. Ann. Statist. 8 937–957.
• [20] Wilks, S. S. (1935). On the independence of $k$ sets of normally distributed statistical variables. Econometrica 3 309–326.
• [21] Yang, Y. and Pan, G. (2012). The convergence of the empirical distribution of canonical correlation coefficients. Electron. J. Probab. 17 no. 64, 13.
• [22] Zheng, S. (2012). Central limit theorems for linear spectral statistics of large dimensional $F$-matrices. Ann. Inst. Henri Poincaré Probab. Stat. 48 444–476.
• [23] Yang, Y. and Pan, G. (2014). Supplement to “Independence test for high dimensional data based on regularized canonical correlation coefficients.” DOI:10.1214/14-AOS1284SUPP.

#### Supplemental materials

• Supplementary material: Supplement to “Independence test for high dimensional data based on regularized canonical correlation coefficients”. The supplementary material is divided into Appendices A and B. Some useful lemmas, and proofs of all theorems and Proposition 4–5 are given in Appendix A while one theorem related to CLT of a sample covariance matrix plus a perturbation matrix is provided in Appendix B.