The Annals of Statistics

Tests for high-dimensional data based on means, spatial signs and spatial ranks

Anirvan Chakraborty and Probal Chaudhuri

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Tests based on mean vectors and spatial signs and ranks for a zero mean in one-sample problems and for the equality of means in two-sample problems have been studied in the recent literature for high-dimensional data with the dimension larger than the sample size. For the above testing problems, we show that under suitable sequences of alternatives, the powers of the mean-based tests and the tests based on spatial signs and ranks tend to be same as the data dimension tends to infinity for any sample size when the coordinate variables satisfy appropriate mixing conditions. Further, their limiting powers do not depend on the heaviness of the tails of the distributions. This is in striking contrast to the asymptotic results obtained in the classical multivariate setting. On the other hand, we show that in the presence of stronger dependence among the coordinate variables, the spatial-sign- and rank-based tests for high-dimensional data can be asymptotically more powerful than the mean-based tests if, in addition to the data dimension, the sample size also tends to infinity. The sizes of some mean-based tests for high-dimensional data studied in the recent literature are observed to be significantly different from their nominal levels. This is due to the inadequacy of the asymptotic approximations used for the distributions of those test statistics. However, our asymptotic approximations for the tests based on spatial signs and ranks are observed to work well when the tests are applied on a variety of simulated and real datasets.

Article information

Source
Ann. Statist., Volume 45, Number 2 (2017), 771-799.

Dates
Received: May 2015
Revised: March 2016
First available in Project Euclid: 16 May 2017

Permanent link to this document
https://projecteuclid.org/euclid.aos/1494921957

Digital Object Identifier
doi:10.1214/16-AOS1467

Mathematical Reviews number (MathSciNet)
MR3650400

Zentralblatt MATH identifier
1368.62147

Subjects
Primary: 62H15: Hypothesis testing 62G10: Hypothesis testing
Secondary: 60G10: Stationary processes 62E20: Asymptotic distribution theory

Keywords
ARMA processes heavy tailed distributions permutation tests $\rho$-mixing randomly scaled $\rho$-mixing spherical distributions stationary sequences

Citation

Chakraborty, Anirvan; Chaudhuri, Probal. Tests for high-dimensional data based on means, spatial signs and spatial ranks. Ann. Statist. 45 (2017), no. 2, 771--799. doi:10.1214/16-AOS1467. https://projecteuclid.org/euclid.aos/1494921957


Export citation

References

  • Bai, Z. and Saranadasa, H. (1996). Effect of high dimension: By an example of a two sample problem. Statist. Sinica 6 311–329.
  • Bradley, R. C. (2005). Basic properties of strong mixing conditions. A survey and some open questions. Probab. Surv. 2 107–144.
  • Cai, T. T., Liu, W. and Xia, Y. (2014). Two-sample test of high dimensional means under dependence. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 349–372.
  • Cappé, O., Moulines, E. and Rydén, T. (2005). Inference in Hidden Markov Models. Springer, New York.
  • Chakraborty, A. and Chaudhuri, P. (2016). Supplement to “Tests for high-dimensional data based on means, spatial signs and spatial ranks.” DOI:10.1214/16-AOS1467SUPP.
  • Chen, S. X. and Qin, Y.-L. (2010). A two-sample test for high-dimensional data with applications to gene-set testing. Ann. Statist. 38 808–835.
  • Choi, K. and Marden, J. (1997). An approach to multivariate rank tests in multivariate analysis of variance. J. Amer. Statist. Assoc. 92 1581–1590.
  • Fan, J. and Lin, S.-K. (1998). Test of significance when data are curves. J. Amer. Statist. Assoc. 93 1007–1021.
  • Feng, L., Zou, C., Wang, Z. and Zhu, L. (2015). Two-sample Behrens–Fisher problem for high-dimensional data. Statist. Sinica 25 1297–1312.
  • Gregory, K. B., Carroll, R. J., Baladandayuthapani, V. and Lahiri, S. N. (2015). A two-sample test for equality of means in high dimension. J. Amer. Statist. Assoc. 110 837–849.
  • Hettmansperger, T. P. and McKean, J. W. (2011). Robust Nonparametric Statistical Methods, 2nd ed. Monographs on Statistics and Applied Probability 119. CRC Press, Boca Raton, FL.
  • Ibragimov, I. A. and Linnik, Y. V. (1971). Independent and Stationary Sequences of Random Variables. Wolters-Noordhoff Publishing, Groningen.
  • Kallenberg, O. (2005). Probabilistic Symmetries and Invariance Principles. Springer, New York.
  • Katayama, S. and Kano, Y. (2014). A new test on high-dimensional mean vector without any assumption on population covariance matrix. Comm. Statist. Theory Methods 43 5290–5304.
  • Kolmogorov, A. N. and Rozanov, J. A. (1960). On a strong mixing condition for stationary Gaussian processes. Teor. Verojatnost. i Primenen. 5 222–227.
  • Lin, Z. and Lu, C. (1996). Limit Theory for Mixing Dependent Random Variables. Mathematics and Its Applications 378. Kluwer, Dordrecht.
  • Marden, J. I. (1999). Multivariate rank tests. In Multivariate Analysis, Design of Experiments, and Survey Sampling. Statist. Textbooks Monogr. 159 401–432. Dekker, New York.
  • Möttönen, J. and Oja, H. (1995). Multivariate spatial sign and rank methods. J. Nonparametr. Stat. 5 201–213.
  • Möttönen, J., Oja, H. and Tienari, J. (1997). On the efficiency of multivariate spatial sign and rank tests. Ann. Statist. 25 542–552.
  • Oja, H. (2010). Multivariate Nonparametric Methods with R: An Approach Based on Spatial Signs and Ranks. Lecture Notes in Statistics 199. Springer, New York.
  • Puri, M. L. and Sen, P. K. (1971). Nonparametric Methods in Multivariate Analysis. Wiley, New York.
  • Srivastava, M. S., Katayama, S. and Kano, Y. (2013). A two sample test in high dimensional data. J. Multivariate Anal. 114 349–358.
  • Wang, L., Peng, B. and Li, R. (2015). A high-dimensional nonparametric multivariate test for mean vector. J. Amer. Statist. Assoc. 110 1658–1669.
  • Wei, S., Lee, C., Wichers, L. and Marron, J. S. (2016). Direction-projection-permutation for high-dimensional hypothesis tests. J. Comput. Graph. Statist. 25 549–569.

Supplemental materials

  • Supplement to “Tests for high-dimensional data based on means, spatial signs and spatial ranks”. This supplemental article contains additional mathematical details related to the proof of part (a) of Theorem 3.3 and the detailed results of the simulation study done in Section 5 of the paper.