## The Annals of Statistics

### Testing for independence of large dimensional vectors

#### Abstract

In this paper, new tests for the independence of two high-dimensional vectors are investigated. We consider the case where the dimension of the vectors increases with the sample size and propose multivariate analysis of variance-type statistics for the hypothesis of a block diagonal covariance matrix. The asymptotic properties of the new test statistics are investigated under the null hypothesis and the alternative hypothesis using random matrix theory. For this purpose, we study the weak convergence of linear spectral statistics of central and (conditionally) noncentral Fisher matrices. In particular, a central limit theorem for linear spectral statistics of large dimensional (conditionally) noncentral Fisher matrices is derived which is then used to analyse the power of the tests under the alternative.

The theoretical results are illustrated by means of a simulation study where we also compare the new tests with several alternative, in particular with the commonly used corrected likelihood ratio test. It is demonstrated that the latter test does not keep its nominal level, if the dimension of one sub-vector is relatively small compared to the dimension of the other sub-vector. On the other hand, the tests proposed in this paper provide a reasonable approximation of the nominal level in such situations. Moreover, we observe that one of the proposed tests is most powerful under a variety of correlation scenarios.

#### Article information

Source
Ann. Statist., Volume 47, Number 5 (2019), 2977-3008.

Dates
Revised: May 2018
First available in Project Euclid: 3 August 2019

https://projecteuclid.org/euclid.aos/1564797870

Digital Object Identifier
doi:10.1214/18-AOS1771

Mathematical Reviews number (MathSciNet)
MR3988779

#### Citation

Bodnar, Taras; Dette, Holger; Parolya, Nestor. Testing for independence of large dimensional vectors. Ann. Statist. 47 (2019), no. 5, 2977--3008. doi:10.1214/18-AOS1771. https://projecteuclid.org/euclid.aos/1564797870

#### References

• Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley Series in Probability and Statistics. Wiley Interscience, Hoboken, NJ.
• Bai, Z. D. and Silverstein, J. W. (2004). CLT for linear spectral statistics of large-dimensional sample covariance matrices. Ann. Probab. 32 553–605.
• Bai, Z. and Silverstein, J. W. (2010). Spectral Analysis of Large Dimensional Random Matrices, 2nd ed. Springer Series in Statistics. Springer, New York.
• Bai, Z., Jiang, D., Yao, J.-F. and Zheng, S. (2009). Corrections to LRT on large-dimensional covariance matrix by RMT. Ann. Statist. 37 3822–3840.
• Bickel, P. J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• Birke, M. and Dette, H. (2005). A note on testing the covariance matrix for large dimension. Statist. Probab. Lett. 74 281–289.
• Bodnar, T., Dette, H. and Parolya, N. (2019). Supplement to “Testing for independence of large dimensional vectors.” DOI:10.1214/18-AOS1771SUPP.
• Bodnar, T., Gupta, A. K. and Parolya, N. (2014). On the strong convergence of the optimal linear shrinkage estimator for large dimensional covariance matrix. J. Multivariate Anal. 132 215–228.
• Bodnar, T., Gupta, A. K. and Parolya, N. (2016). Direct shrinkage estimation of large dimensional precision matrix. J. Multivariate Anal. 146 223–236.
• Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• Cai, T. T., Ren, Z. and Zhou, H. H. (2013). Optimal rates of convergence for estimating Toeplitz covariance matrices. Probab. Theory Related Fields 156 101–143.
• Cai, T. T. and Shen, X., eds. (2011). High-Dimensional Data Analysis. Frontiers of Statistics 2. World Scientific Co. Pte. Ltd., Singapore; Higher Education Press, Beijing.
• Cai, T. T. and Zhou, H. H. (2012). Minimax estimation of large covariance matrices under $\ell_{1}$-norm. Statist. Sinica 22 1319–1349.
• Chen, S. X., Zhang, L.-X. and Zhong, P.-S. (2010). Tests for high-dimensional covariance matrices. J. Amer. Statist. Assoc. 105 810–819.
• Devijver, E. and Gallopin, M. (2018). Block-diagonal covariance selection for high-dimensional Gaussian graphical models. J. Amer. Statist. Assoc. 113 306–314.
• Dozier, R. B. and Silverstein, J. W. (2007). On the empirical distribution of eigenvalues of large dimensional information-plus-noise-type matrices. J. Multivariate Anal. 98 678–694.
• Fan, J. and Li, R. (2006). Statistical challenges with high dimensionality: Feature selection in knowledge discovery. In International Congress of Mathematicians. Vol. III 595–622. Eur. Math. Soc., Zürich.
• Fisher, R. A. (1939). The sampling distribution of some statistics obtained from non-linear equations. Ann. Eugenics 9 238–249.
• Fisher, T. J., Sun, X. and Gallagher, C. M. (2010). A new test for sphericity of the covariance matrix for high dimensional data. J. Multivariate Anal. 101 2554–2570.
• Fujikoshi, Y., Himeno, T. and Wakaki, H. (2004). Asymptotic results of a high dimensional Manova test and power comparison when the dimension is large compared to the sample size. J. Japan Statist. Soc. 34 19–26.
• Gupta, A. K. and Xu, J. (2006). On some tests of the covariance matrix under general conditions. Ann. Inst. Statist. Math. 58 101–114.
• Hyodo, M., Shutoh, N., Nishiyama, T. and Pavlenko, T. (2015). Testing block-diagonal covariance structure for high-dimensional data. Stat. Neerl. 69 460–482.
• Jiang, D., Bai, Z. and Zheng, S. (2013). Testing the independence of sets of large-dimensional variables. Sci. China Math. 56 135–147.
• Jiang, T. and Yang, F. (2013). Central limit theorems for classical likelihood ratio tests for high-dimensional normal distributions. Ann. Statist. 41 2029–2074.
• John, S. (1971). Some optimal multivariate tests. Biometrika 58 123–127.
• Johnstone, I. M. (2001). On the distribution of the largest eigenvalue in principal components analysis. Ann. Statist. 29 295–327.
• Johnstone, I. (2006). High dimensional statistical inference and random matrices. Instituto de Ciencias Matemáticas (ICMAT). Available at: http://www.icm2006.org/proceedings/Vol_I/17.pdf.
• Johnstone, I. M. (2008). Multivariate analysis and Jacobi ensembles: Largest eigenvalue, Tracy–Widom limits and rates of convergence. Ann. Statist. 36 2638–2716.
• Kakizawa, Y. and Iwashita, T. (2008). A comparison of higher-order local powers of a class of one-way MANOVA tests under general distributions. J. Multivariate Anal. 99 1128–1153.
• Ledoit, O. and Wolf, M. (2002). Some hypothesis tests for the covariance matrix when the dimension is large compared to the sample size. Ann. Statist. 30 1081–1102.
• Ledoit, O. and Wolf, M. (2012). Nonlinear shrinkage estimation of large-dimensional covariance matrices. Ann. Statist. 40 1024–1060.
• Markowitz, H. (1952). Portfolio selection. J. Finance 7 77–91.
• Mauchly, J. W. (1940). Significance test for sphericity of a normal $n$-variate distribution. Ann. Math. Stat. 11 204–209.
• Muirhead, R. J. (1982). Aspects of Multivariate Statistical Theory. Wiley Series in Probability and Mathematical Statistics. Wiley, New York.
• Nagao, H. (1973). On some test criteria for covariance matrix. Ann. Statist. 1 700–709.
• Pillai, K. C. S. and Jayachandran, K. (1967). Power comparisons of tests of two multivariate hypotheses based on four criteria. Biometrika 54 195–210.
• Schott, J. R. (2007). Some high-dimensional tests for a one-way MANOVA. J. Multivariate Anal. 98 1825–1839.
• Wang, C., Pan, G., Tong, T. and Zhu, L. (2015). Shrinkage estimation of large dimensional precision matrix using random matrix theory. Statist. Sinica 25 993–1008.
• Yamada, Y., Hyodo, M. and Nishiyama, T. (2017). Testing block-diagonal covariance structure for high-dimensional data under non-normality. J. Multivariate Anal. 155 305–316.
• Yang, Y. and Pan, G. (2015). Independence test for high dimensional data based on regularized canonical correlation coefficients. Ann. Statist. 43 467–500.
• Yao, J. (2013). Estimation and fluctuations of functionals of large random matrices. Telecom ParisTech, tel-00909521v1.
• Yao, J., Zheng, S. and Bai, Z. (2015). Large Sample Covariance Matrices and High-Dimensional Data Analysis. Cambridge Series in Statistical and Probabilistic Mathematics 39. Cambridge Univ. Press, New York.
• Zheng, S. (2012). Central limit theorems for linear spectral statistics of large dimensional $F$-matrices. Ann. Inst. Henri Poincaré Probab. Stat. 48 444–476.
• Zheng, S., Bai, Z. and Yao, J. (2015a). CLT for linear spectral statistics of a rescaled sample precision matrix. Random Matrices Theory Appl. 4 1550014, 43.
• Zheng, S., Bai, Z. and Yao, J. (2015b). Substitution principle for CLT of linear spectral statistics of high-dimensional sample covariance matrices with applications to hypothesis testing. Ann. Statist. 43 546–591.
• Zheng, S., Bai, Z. and Yao, J. (2017). CLT for eigenvalue statistics of large-dimensional general Fisher matrices with applications. Bernoulli 23 1130–1178.

#### Supplemental materials

• Supplement to “Testing for independence of large dimensional vectors”. The supplementary material contains the proofs of Theorem 1, Lemma 1–2 and additional simulations provided in Figures 10–14.