## Bernoulli

• Bernoulli
• Volume 25, Number 2 (2019), 1472-1503.

### An extreme-value approach for testing the equality of large U-statistic based correlation matrices

#### Abstract

There has been an increasing interest in testing the equality of large Pearson’s correlation matrices. However, in many applications it is more important to test the equality of large rank-based correlation matrices since they are more robust to outliers and nonlinearity. Unlike the Pearson’s case, testing the equality of large rank-based statistics has not been well explored and requires us to develop new methods and theory. In this paper, we provide a framework for testing the equality of two large U-statistic based correlation matrices, which include the rank-based correlation matrices as special cases. Our approach exploits extreme value statistics and the Jackknife estimator for uncertainty assessment and is valid under a fully nonparametric model. Theoretically, we develop a theory for testing the equality of U-statistic based correlation matrices. We then apply this theory to study the problem of testing large Kendall’s tau correlation matrices and demonstrate its optimality. For proving this optimality, a novel construction of least favorable distributions is developed for the correlation matrix comparison.

#### Article information

Source
Bernoulli, Volume 25, Number 2 (2019), 1472-1503.

Dates
Revised: February 2018
First available in Project Euclid: 6 March 2019

https://projecteuclid.org/euclid.bj/1551862857

Digital Object Identifier
doi:10.3150/18-BEJ1027

#### Citation

Zhou, Cheng; Han, Fang; Zhang, Xin-Sheng; Liu, Han. An extreme-value approach for testing the equality of large U-statistic based correlation matrices. Bernoulli 25 (2019), no. 2, 1472--1503. doi:10.3150/18-BEJ1027. https://projecteuclid.org/euclid.bj/1551862857

#### References

• [1] Anderson, T.W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley Series in Probability and Statistics. Hoboken, NJ: Wiley-Interscience [John Wiley & Sons].
• [2] Aslam, S. and Rocke, D.M. (2005). A robust testing procedure for the equality of covariance matrices. Comput. Statist. Data Anal. 49 863–874.
• [3] Bai, Z., Jiang, D., Yao, J.-F. and Zheng, S. (2009). Corrections to LRT on large-dimensional covariance matrix by RMT. Ann. Statist. 37 3822–3840.
• [4] Bai, Z. and Zhou, W. (2008). Large sample covariance matrices without independence structures in columns. Statist. Sinica 18 425–442.
• [5] Bai, Z.D. and Yin, Y.Q. (1993). Limit of the smallest eigenvalue of a large-dimensional sample covariance matrix. Ann. Probab. 21 1275–1294.
• [6] Bao, Z., Lin, L.-C., Pan, G. and Zhou, W. (2015). Spectral statistics of large dimensional Spearman’s rank correlation matrix and its application. Ann. Statist. 43 2588–2623.
• [7] Baraud, Y. (2002). Non-asymptotic minimax rates of testing in signal detection. Bernoulli 8 577–606.
• [8] Bickel, P.J. and Levina, E. (2008). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• [9] Cai, T. and Liu, W. (2011). Adaptive thresholding for sparse covariance matrix estimation. J. Amer. Statist. Assoc. 106 672–684.
• [10] Cai, T., Liu, W. and Xia, Y. (2013). Two-sample covariance matrix testing and support recovery in high-dimensional and sparse settings. J. Amer. Statist. Assoc. 108 265–277.
• [11] Cai, T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.
• [12] El Maache, H. and Lepage, Y. (2003). Spearman’s rho and Kendall’s tau for multivariate data sets. In Mathematical Statistics and Applications: Festschrift for Constance van Eeden. Institute of Mathematical Statistics Lecture Notes—Monograph Series 42 113–130. IMS, Beachwood, OH.
• [13] Embrechts, P., Lindskog, F. and McNeil, A. (2003). Modelling dependence with copulas and applications to risk management. Handbook of Heavy Tailed Distributions in Finance 8 329–384.
• [14] Fang, H.-B., Fang, K.-T. and Kotz, S. (2002). The meta-elliptical distributions with given marginals. J. Multivariate Anal. 82 1–16.
• [15] Giedd, J.N., Blumenthal, J., Molloy, E. and Castellanos, F.X. (2001). Brain imaging of attention deficit/hyperactivity disorder. Ann. N.Y. Acad. Sci. 931 33–49.
• [16] Han, F., Chen, S. and Liu, H. (2017). Distribution-free tests of independence in high dimensions. Biometrika 104 813–828.
• [17] Han, F., Xu, S. and Zhou, W.-X. (2018). On Gaussian comparison inequality and its application to spectral analysis of large random matrices. Bernoulli 24 1787–1833.
• [18] Han, F., Zhao, T. and Liu, H. (2013). CODA: High dimensional copula discriminant analysis. J. Mach. Learn. Res. 14 629–671.
• [19] Ho, J.W.K., Stefani, M., dos Remedios, C.G. and Charleston, M.A. (2008). Differential variability analysis of gene expression and its application to human diseases. Bioinformatics 24 i390–i398.
• [20] Hoeffding, W. (1948). A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19 293–325.
• [21] Hu, R., Qiu, X. and Glazko, G. (2010). A new gene selection procedure based on the covariance distance. Bioinformatics 26 348–354.
• [22] Hu, R., Qiu, X., Glazko, G., Klebanov, L. and Yakovlev, A. (2009). Detecting intergene correlation changes in microarray analysis: A new approach to gene selection. BMC Bioinform. 10 20.
• [23] Jiang, D., Jiang, T. and Yang, F. (2012). Likelihood ratio tests for covariance matrices of high-dimensional normal distributions. J. Statist. Plann. Inference 142 2241–2256.
• [24] Kendall, M.G. (1938). A new measure of rank correlation. Biometrika 30 81–93.
• [25] Klüppelberg, C. and Kuhn, G. (2009). Copula structure analysis. J. R. Stat. Soc. Ser. B. Stat. Methodol. 71 737–753.
• [26] Kruskal, W.H. (1958). Ordinal measures of association. J. Amer. Statist. Assoc. 53 814–861.
• [27] Li, J. and Chen, S.X. (2012). Two sample tests for high-dimensional covariance matrices. Ann. Statist. 40 908–940.
• [28] Liu, H., Han, F., Yuan, M., Lafferty, J. and Wasserman, L. (2012). High-dimensional semiparametric Gaussian copula graphical models. Ann. Statist. 40 2293–2326.
• [29] Lopez-Paz, D., Hennig, P. and Schölkopf, B. (2013). The randomized dependence coefficient. In Advances in Neural Information Processing Systems 1–9.
• [30] Lou, H., Henriksen, L. and Bruhn, P. (1990). Focal cerebral dysfunction in developmental learning disabilities. Lancet 335 8–11.
• [31] Mai, Q. and Zou, H. (2015). Sparse semiparametric discriminant analysis. J. Multivariate Anal. 135 175–188.
• [32] Markowitz, H.M. (1991). Foundations of portfolio theory. J. Finance 46 469–477.
• [33] Muirhead, R.J. (1982). Aspects of Multivariate Statistical Theory. New York: John Wiley & Sons, Inc. Wiley Series in Probability and Mathematical Statistics.
• [34] Nagao, H. (1973). On some test criteria for covariance matrix. Ann. Statist. 1 700–709.
• [35] O’Brien, P.C. (1992). Robust procedures for testing equality of covariance matrices. Biometrics 48 819–827.
• [36] Ravikumar, P., Wainwright, M.J., Raskutti, G. and Yu, B. (2011). High-dimensional covariance estimation by minimizing $\ell_{1}$-penalized log-determinant divergence. Electron. J. Stat. 5 935–980.
• [37] Roy, S.N. (1957). Some Aspects of Multivariate Analysis. Calcutta: Wiley, New York; Indian Statistical Institute.
• [38] Schott, J.R. (2007). A test for the equality of covariance matrices when the dimension is large relative to the sample sizes. Comput. Statist. Data Anal. 51 6535–6542.
• [39] Shafritz, K.M., Marchione, K.E., Gore, J.C., Shaywitz, S.E. and Shaywitz, B.A. (2004). The effects of methylphenidate on neural systems of attention in attention deficit hyperactivity disorder. Amer. J. Psychiatry 161 1990–1997.
• [40] Spearman, C. (1904). The proof and measurement of association between two things. Amer. J. Psychology 15 72–101.
• [41] Srivastava, M.S. and Du, M. (2008). A test for the mean vector with fewer observations than the dimension. J. Multivariate Anal. 99 386–402.
• [42] Srivastava, M.S. and Yanagihara, H. (2010). Testing the equality of several covariance matrices with fewer observations than the dimension. J. Multivariate Anal. 101 1319–1329.
• [43] Székely, G.J., Rizzo, M.L. and Bakirov, N.K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 2769–2794.
• [44] Yufeng, Z., Yong, H., Chaozhe, Z., Qingjiu, C., Manqiu, S., Meng, L., Lixia, T., Tianzi, J. and Yufeng, W. (2007). Altered baseline brain activity in children with ADHD revealed by resting-state functional MRI. Brain and Development 29 83–91.
• [45] Zhao, T., Roeder, K. and Liu, H. (2014). Positive semidefinite rank-based correlation matrix estimation with application to semiparametric graph estimation. J. Comput. Graph. Statist. 23 895–922.
• [46] Zhou, C., Han, F., Zhang, X.-S. and Liu, H. (2019). Supplement to “An extreme-value approach for testing the equality of large U-statistic based correlation matrices.” DOI:10.3150/18-BEJ1027SUPP.
• [47] Zhou, W. (2007). Asymptotic distribution of the largest off-diagonal entry of correlation matrices. Trans. Amer. Math. Soc. 359 5345–5363.
• [48] Zou, Q., Zhu, C., Yang, Y., Zuo, X., Long, X., Cao, Q., Wang, Y. and Zang, Y. (2008). An improved approach to detection of amplitude of low-frequency fluctuation (ALFF) for resting-state fMRI: Fractional ALFF. J. Neurosci. Methods 172 137–141.

#### Supplemental materials

• Technical Proofs and More Simulation for “An extreme-value approach for testing the equality of large U-statistic based correlation matrices”. We provide additional proof and simulation in Supplementary Material (Zhou et al. [46]). The Supplementary Material consists of 6 parts: Supplements A–F. Among them, Supplements A–D prove the theorems that are not proven in Appendix A. Supplement E introduces some useful definitions. Supplement F presents more simulation results.