The Annals of Applied Statistics

DISCO analysis: A nonparametric extension of analysis of variance

Maria L. Rizzo and Gábor J. Székely

Full-text: Open access

Abstract

In classical analysis of variance, dispersion is measured by considering squared distances of sample elements from the sample mean. We consider a measure of dispersion for univariate or multivariate response based on all pairwise distances between-sample elements, and derive an analogous distance components (DISCO) decomposition for powers of distance in (0, 2]. The ANOVA F statistic is obtained when the index (exponent) is 2. For each index in (0, 2), this decomposition determines a nonparametric test for the multi-sample hypothesis of equal distributions that is statistically consistent against general alternatives.

Article information

Source
Ann. Appl. Stat., Volume 4, Number 2 (2010), 1034-1055.

Dates
First available in Project Euclid: 3 August 2010

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1280842151

Digital Object Identifier
doi:10.1214/09-AOAS245

Mathematical Reviews number (MathSciNet)
MR2758432

Zentralblatt MATH identifier
1194.62054

Keywords
Distance components DISCO multisample problem test equal distributions multivariate nonparametric MANOVA extension

Citation

Rizzo, Maria L.; Székely, Gábor J. DISCO analysis: A nonparametric extension of analysis of variance. Ann. Appl. Stat. 4 (2010), no. 2, 1034--1055. doi:10.1214/09-AOAS245. https://projecteuclid.org/euclid.aoas/1280842151


Export citation

References

  • Akritas, M. G. and Arnold, S. F. (1994). Fully nonparametric hypotheses for factorial designs. I. Multivariate repeated measures designs. J. Amer. Statist. Assoc. 89 336–343.
  • Anderson, M. J. (2001). A new method for non-parametric multivariate analysis of variance. Austral. Ecology 26 32–46.
  • Anderson, T. W. (1984). An Introduction to Multivariate Statistical Analysis, 2nd ed. Wiley, New York.
  • Brunner, E. and Puri, M. L. (2001). Nonparametric methods in factorial designs. Statist. Papers 42 1–52.
  • Canty, A. and Ripley, B. (2009). boot: Bootstrap R (S-Plus) Functions. R package version 1.2-35.
  • Cochran, W. G. and Cox, G. M. (1957). Experimental Designs, 2nd ed. Wiley, New York.
  • Davison, A. C. and Hinkley, D. V. (1997). Bootstrap Methods and Their Application. Cambridge Univ. Press, Oxford.
  • Efron, B. and Tibshirani, R. J. (1993). An Introduction to the Bootstrap. Chapman & Hall/CRC, Boca Raton, FL.
  • Excoffier, L. Smouse, P. E. and Quattro, J. M. (1992). Analysis of molecular variance inferred from metric distances among DNA haplotypes: Application to human mitochondrial DNA restriction data. Genetics 131 479–491.
  • Gower, J. C. and Krzanowski, W. J. (1999). Analysis of distance for structured multivariate data and extensions to multivariate analysis of variance. J. Roy. Statist. Soc. C 48 505–519.
  • Hand, D. J. and Taylor, C. C. (1987). Multivariate Analysis of Variance and Repeated Measures. Chapman and Hall, New York.
  • Hollander, M. and Wolfe, D. A. (1999). Nonparametric Statistical Methods, 2nd ed. Wiley, New York.
  • Mardia, K. V., Kent, J. T. and Bibby, J. M. (1979). Multivariate Analysis. Academic Press, San Diego, CA.
  • McArdle, B. H. and Anderson, M. J. (2001). Fitting multivariate models to community data: A comment on distance-based redundancy analysis. Ecology 82 290–297.
  • Pillai, K. C. S. (1955). Some new test criteria in multivariate analysis. Ann. Math. Statist. 26 117–121.
  • R Development Core Team (2009). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria. Available at http://www.r-project.org. ISBN 3-900051-07-0.
  • Rizzo, M. L. and Székely, G. J. (2009). disco: Distance components. R package version 0.1-0.
  • Scheffé, H. (1953). Analysis of Variance. Wiley, New York.
  • Searle, S. R., Casella, G. and McCulloch, C. E. (1992). Variance Components. Wiley, New York.
  • Serfling, R. J. (1980). Approximation Theorems of Mathematical Statistics. Wiley, New York.
  • Székely, G. J. and Bakirov, N. K. (2003). Extremal probabilities for Gaussian quadratic forms. Probab. Theory Related Fields 126 184–202.
  • Székely, G. J. and Rizzo, M. L. (2005a). A new test for multivariate normality. J. Multivariate Anal. 93 58–80.
  • Székely, G. J. and Rizzo, M. L. (2005b). Hierarchical clustering via joint between-within distances: Extending Ward’s minimum variance method. J. Classification 22 151–183.
  • Wilks, S. S. (1932). Certain generalizations in the analysis of variance. Biometrika 24 471–494.
  • Zapala, M. A. and Schork, N. J. (2006). Multivariate regression analysis of distance matrices for testing associations between gene expression patterns and related variables. Proc. Natl. Acad. Sci. USA 103 19430–19435.