Tests for high-dimensional data based on means, spatial signs and spatial ranks

Anirvan Chakraborty; Probal Chaudhuri

doi:10.1214/16-AOS1467

April 2017 Tests for high-dimensional data based on means, spatial signs and spatial ranks

Anirvan Chakraborty, Probal Chaudhuri

Ann. Statist. 45(2): 771-799 (April 2017). DOI: 10.1214/16-AOS1467

Abstract

Tests based on mean vectors and spatial signs and ranks for a zero mean in one-sample problems and for the equality of means in two-sample problems have been studied in the recent literature for high-dimensional data with the dimension larger than the sample size. For the above testing problems, we show that under suitable sequences of alternatives, the powers of the mean-based tests and the tests based on spatial signs and ranks tend to be same as the data dimension tends to infinity for any sample size when the coordinate variables satisfy appropriate mixing conditions. Further, their limiting powers do not depend on the heaviness of the tails of the distributions. This is in striking contrast to the asymptotic results obtained in the classical multivariate setting. On the other hand, we show that in the presence of stronger dependence among the coordinate variables, the spatial-sign- and rank-based tests for high-dimensional data can be asymptotically more powerful than the mean-based tests if, in addition to the data dimension, the sample size also tends to infinity. The sizes of some mean-based tests for high-dimensional data studied in the recent literature are observed to be significantly different from their nominal levels. This is due to the inadequacy of the asymptotic approximations used for the distributions of those test statistics. However, our asymptotic approximations for the tests based on spatial signs and ranks are observed to work well when the tests are applied on a variety of simulated and real datasets.

References

1.

Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329.Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329.

2.

Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107–144.Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107–144.

3.

Cai, T. T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.Cai, T. T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.

4.

Cappé, O., Moulines, E. and Rydén, T. (2005). Inference in Hidden Markov Models. Springer, New York.Cappé, O., Moulines, E. and Rydén, T. (2005). Inference in Hidden Markov Models. Springer, New York.

5.

Chakraborty, A. and Chaudhuri, P. (2016). Supplement to “Tests for high-dimensional data based on means, spatial signs and spatial ranks.” DOI:10.1214/16-AOS1467SUPP.Chakraborty, A. and Chaudhuri, P. (2016). Supplement to “Tests for high-dimensional data based on means, spatial signs and spatial ranks.” DOI:10.1214/16-AOS1467SUPP.

6.

Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835.Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835.

7.

Choi, K. and Marden, J. (1997). An approach to multivariate rank tests in multivariate analysis of variance. J. Amer. Statist. Assoc. 92 1581–1590.Choi, K. and Marden, J. (1997). An approach to multivariate rank tests in multivariate analysis of variance. J. Amer. Statist. Assoc. 92 1581–1590.

8.

Fan, J. and Lin, S.-K. (1998). Test of significance when data are curves. J. Amer. Statist. Assoc. 93 1007–1021.Fan, J. and Lin, S.-K. (1998). Test of significance when data are curves. J. Amer. Statist. Assoc. 93 1007–1021.

9.

Feng, L., Zou, C., Wang, Z. and Zhu, L. (2015). Two-sample Behrens–Fisher problem for high-dimensional data. Statist. Sinica 25 1297–1312.Feng, L., Zou, C., Wang, Z. and Zhu, L. (2015). Two-sample Behrens–Fisher problem for high-dimensional data. Statist. Sinica 25 1297–1312.

10.

Gregory, K. B., Carroll, R. J., Baladandayuthapani, V. and Lahiri, S. N. (2015). A two-sample test for equality of means in high dimension. J. Amer. Statist. Assoc. 110 837–849.Gregory, K. B., Carroll, R. J., Baladandayuthapani, V. and Lahiri, S. N. (2015). A two-sample test for equality of means in high dimension. J. Amer. Statist. Assoc. 110 837–849.

11.

Hettmansperger, T. P. and McKean, J. W. (2011). Robust Nonparametric Statistical Methods, 2nd ed. Monographs on Statistics and Applied Probability 119. CRC Press, Boca Raton, FL.Hettmansperger, T. P. and McKean, J. W. (2011). Robust Nonparametric Statistical Methods, 2nd ed. Monographs on Statistics and Applied Probability 119. CRC Press, Boca Raton, FL.

12.

Ibragimov, I. A. and Linnik, Y. V. (1971). Independent and Stationary Sequences of Random Variables. Wolters-Noordhoff Publishing, Groningen.Ibragimov, I. A. and Linnik, Y. V. (1971). Independent and Stationary Sequences of Random Variables. Wolters-Noordhoff Publishing, Groningen.

13.

Kallenberg, O. (2005). Probabilistic Symmetries and Invariance Principles. Springer, New York.Kallenberg, O. (2005). Probabilistic Symmetries and Invariance Principles. Springer, New York.

14.

Katayama, S. and Kano, Y. (2014). A new test on high-dimensional mean vector without any assumption on population covariance matrix. Comm. Statist. Theory Methods 43 5290–5304.Katayama, S. and Kano, Y. (2014). A new test on high-dimensional mean vector without any assumption on population covariance matrix. Comm. Statist. Theory Methods 43 5290–5304.

15.

Kolmogorov, A. N. and Rozanov, J. A. (1960). On a strong mixing condition for stationary Gaussian processes. Teor. Verojatnost. i Primenen. 5 222–227.Kolmogorov, A. N. and Rozanov, J. A. (1960). On a strong mixing condition for stationary Gaussian processes. Teor. Verojatnost. i Primenen. 5 222–227.

16.

Lin, Z. and Lu, C. (1996). Limit Theory for Mixing Dependent Random Variables. Mathematics and Its Applications 378. Kluwer, Dordrecht.Lin, Z. and Lu, C. (1996). Limit Theory for Mixing Dependent Random Variables. Mathematics and Its Applications 378. Kluwer, Dordrecht.

17.

Marden, J. I. (1999). Multivariate rank tests. In Multivariate Analysis, Design of Experiments, and Survey Sampling. Statist. Textbooks Monogr. 159 401–432. Dekker, New York.Marden, J. I. (1999). Multivariate rank tests. In Multivariate Analysis, Design of Experiments, and Survey Sampling. Statist. Textbooks Monogr. 159 401–432. Dekker, New York.

18.

Möttönen, J. and Oja, H. (1995). Multivariate spatial sign and rank methods. J. Nonparametr. Stat. 5 201–213.Möttönen, J. and Oja, H. (1995). Multivariate spatial sign and rank methods. J. Nonparametr. Stat. 5 201–213.

19.

Möttönen, J., Oja, H. and Tienari, J. (1997). On the efficiency of multivariate spatial sign and rank tests. Ann. Statist. 25 542–552.Möttönen, J., Oja, H. and Tienari, J. (1997). On the efficiency of multivariate spatial sign and rank tests. Ann. Statist. 25 542–552.

20.

Oja, H. (2010). Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs and Ranks. Lecture Notes in Statistics 199. Springer, New York.Oja, H. (2010). Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs and Ranks. Lecture Notes in Statistics 199. Springer, New York.

21.

Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis. Wiley, New York.Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis. Wiley, New York.

22.

Srivastava, M. S., Katayama, S. and Kano, Y. (2013). A two sample test in high dimensional data. J. Multivariate Anal. 114 349–358.Srivastava, M. S., Katayama, S. and Kano, Y. (2013). A two sample test in high dimensional data. J. Multivariate Anal. 114 349–358.

23.

Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. J. Amer. Statist. Assoc. 110 1658–1669.Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. J. Amer. Statist. Assoc. 110 1658–1669.

24.

Wei, S., Lee, C., Wichers, L. and Marron, J. S. (2016). Direction-projection-permutation for high-dimensional hypothesis tests. J. Comput. Graph. Statist. 25 549–569.Wei, S., Lee, C., Wichers, L. and Marron, J. S. (2016). Direction-projection-permutation for high-dimensional hypothesis tests. J. Comput. Graph. Statist. 25 549–569.

Citation Download Citation

Anirvan Chakraborty and Probal Chaudhuri "Tests for high-dimensional data based on means, spatial signs and spatial ranks," The Annals of Statistics 45(2), 771-799, (April 2017). https://doi.org/10.1214/16-AOS1467

Received: 1 May 2015; Published: April 2017

Access the abstract

JOURNAL ARTICLE
29 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY