The Annals of Statistics

Two-sample and ANOVA tests for high dimensional means

Abstract

This paper considers testing the equality of two high dimensional means. Two approaches are utilized to formulate $L_{2}$-type tests for better power performance when the two high dimensional mean vectors differ only in sparsely populated coordinates and the differences are faint. One is to conduct thresholding to remove the nonsignal bearing dimensions for variance reduction of the test statistics. The other is to transform the data via the precision matrix for signal enhancement. It is shown that the thresholding and data transformation lead to attractive detection boundaries for the tests. Furthermore, we demonstrate explicitly the effects of precision matrix estimation on the detection boundary for the test with thresholding and data transformation. Extension to multi-sample ANOVA tests is also investigated. Numerical studies are performed to confirm the theoretical findings and demonstrate the practical implementations.

Article information

Source
Ann. Statist., Volume 47, Number 3 (2019), 1443-1474.

Dates
Revised: July 2017
First available in Project Euclid: 13 February 2019

https://projecteuclid.org/euclid.aos/1550026845

Digital Object Identifier
doi:10.1214/18-AOS1720

Mathematical Reviews number (MathSciNet)
MR3911118

Zentralblatt MATH identifier
07053514

Subjects
Primary: 62H15: Hypothesis testing
Secondary: 62E20: Asymptotic distribution theory

Citation

Chen, Song Xi; Li, Jun; Zhong, Ping-Shou. Two-sample and ANOVA tests for high dimensional means. Ann. Statist. 47 (2019), no. 3, 1443--1474. doi:10.1214/18-AOS1720. https://projecteuclid.org/euclid.aos/1550026845

References

• Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329.
• Bickel, P. J. and Levina, E. (2008a). Regularized estimation of large covariance matrices. Ann. Statist. 36 199–227.
• Bickel, P. J. and Levina, E. (2008b). Covariance regularization by thresholding. Ann. Statist. 36 2577–2604.
• Cai, T., Liu, W. and Luo, X. (2011). A constrained $\ell_{1}$ minimization approach to sparse precision matrix estimation. J. Amer. Statist. Assoc. 106 594–607.
• Cai, T. T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.
• Chen, S. X., Li, J. and Zhong, P.-S. (2019). Supplement to “Two-sample and ANOVA tests for high dimensional means.” DOI:10.1214/18-AOS1720SUPP.
• Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835.
• Delaigle, A., Hall, P. and Jin, J. (2011). Robustness and accuracy of methods for high dimensional data analysis based on Student’s $t$-statistic. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 283–301.
• Donoho, D. and Jin, J. (2004). Higher criticism for detecting sparse heterogeneous mixtures. Ann. Statist. 32 962–994.
• Donoho, D. L. and Johnstone, I. M. (1994). Ideal spatial adaptation by wavelet shrinkage. Biometrika 81 425–455.
• Fan, J. (1996). Test of significance based on wavelet thresholding and Neyman’s truncation. J. Amer. Statist. Assoc. 91 674–688.
• Feng, L., Zou, C., Wang, Z. and Zhu, L. (2015). Two-sample Behrens–Fisher problem for high-dimensional data. Statist. Sinica 25 1297–1312.
• Gregory, K. B., Carroll, R. J., Baladandayuthapani, V. and Lahiri, S. N. (2015). A two-sample test for equality of means in high dimension. J. Amer. Statist. Assoc. 110 837–849.
• Gröchenig, K. and Leinert, M. (2006). Symmetry and inverse-closedness of matrix algebras and functional calculus for infinite matrices. Trans. Amer. Math. Soc. 358 2695–2711.
• Hall, P. and Jin, J. (2010). Innovated higher criticism for detecting sparse signals in correlated noise. Ann. Statist. 38 1686–1732.
• He, J. and Chen, S. X. (2016). Testing super-diagonal structure in high dimensional covariance matrices. J. Econometrics 194 283–297.
• Hotelling, H. (1931). The generalization of student’s ratio. Ann. Math. Stat. 2 360–378.
• Ingster, Yu. I. (1997). Some problems of hypothesis testing leading to infinitely divisible distributions. Math. Methods Statist. 6 47–69.
• Jaffard, S. (1990). Propriétés des matrices “bien localisées” près de leur diagonale et quelques applications. Ann. Inst. H. Poincaré Anal. Non Linéaire 7 461–476.
• Ji, P. and Jin, J. (2012). UPS delivers optimal phase diagram in high-dimensional variable selection. Ann. Statist. 40 73–103.
• Petrov, V. V. (1995). Limit Theorems of Probability Theory: Sequences of Independent Random Variables. Oxford Studies in Probability 4. The Clarendon Press, New York.
• Qiu, Y. and Chen, S. X. (2015). Bandwidth selection for high-dimensional covariance matrix estimation. J. Amer. Statist. Assoc. 110 1160–1174.
• Srivastava, M. S., Katayama, S. and Kano, Y. (2013). A two sample test in high dimensional data. J. Multivariate Anal. 114 349–358.
• Sun, Q. (2005). Wiener’s lemma for infinite matrices with polynomial off-diagonal decay. C. R. Math. Acad. Sci. Paris 340 567–570.
• Wagaman, A. S. and Levina, E. (2009). Discovering sparse covariance structures with the isomap. J. Comput. Graph. Statist. 18 551–572.
• Wu, W. B. (2005). Nonlinear system theory: Another look at dependence. Proc. Natl. Acad. Sci. USA 102 14150–14154.
• Zhong, P.-S., Chen, S. X. and Xu, M. (2013). Tests alternative to higher criticism for high-dimensional means under sparsity and column-wise dependence. Ann. Statist. 41 2820–2851.

Supplemental materials

• Supplement to “Two-sample and ANOVA tests for high dimensional means”. The Supplementary Material provides the proofs of lemmas, propositions and Theorems 2, 3 and 5. It also includes extra simulation results and an empirical study.