Statistics Surveys

Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules

Michael P. Fay and Michael A. Proschan

Full-text: Open access

Abstract

In a mathematical approach to hypothesis tests, we start with a clearly defined set of hypotheses and choose the test with the best properties for those hypotheses. In practice, we often start with less precise hypotheses. For example, often a researcher wants to know which of two groups generally has the larger responses, and either a t-test or a Wilcoxon-Mann-Whitney (WMW) test could be acceptable. Although both t-tests and WMW tests are usually associated with quite different hypotheses, the decision rule and p-value from either test could be associated with many different sets of assumptions, which we call perspectives. It is useful to have many of the different perspectives to which a decision rule may be applied collected in one place, since each perspective allows a different interpretation of the associated p-value. Here we collect many such perspectives for the two-sample t-test, the WMW test and other related tests. We discuss validity and consistency under each perspective and discuss recommendations between the tests in light of these many different perspectives. Finally, we briefly discuss a decision rule for testing genetic neutrality where knowledge of the many perspectives is vital to the proper interpretation of the decision rule.

Article information

Source
Statist. Surv. Volume 4 (2010), 1-39.

Dates
First available in Project Euclid: 22 February 2010

Permanent link to this document
http://projecteuclid.org/euclid.ssu/1266847666

Digital Object Identifier
doi:10.1214/09-SS051

Mathematical Reviews number (MathSciNet)
MR2595125

Zentralblatt MATH identifier
1188.62154

Keywords
Behrens-Fisher problem interval censored data nonparametric Behrens-Fisher problem Tajima’s D t-test Wilcoxon rank sum test

Citation

Fay, Michael P.; Proschan, Michael A. Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Statist. Surv. 4 (2010), 1--39. doi:10.1214/09-SS051. http://projecteuclid.org/euclid.ssu/1266847666.


Export citation

References

  • Andersen, P.K., Borgan, O, Gill, R.D., and Keiding, N. (1993)., Statistical Models Based on Counting Processes Springer-Verlag: New York.
  • Berger, R.L. and Boos, D.D. (1994). P values maximized over a confidence set for the nuisance parameter., Journal of the American Statistical Association 89 1012–1016.
  • Blair, R. C. and Higgins, J.J. (1980). A comparison of the power of Wilcoxon’s rank-sum statistic to that of Student’s t statistic under various nonnormal distributions., Journal of Educational Statistics 5 309–334.
  • Box, G.E.P., Hunter, J.S., and Hunter, W.G. (2005)., Statistics for Experimenters: Design, Innovation, and Discovery, second edition. Wiley: New York.
  • Brunner, E. and Munzel, U. (2000). The Nonparametric Behrens-Fisher Problem: Asymptotic Theory and a Small-Sample Approximation., Biometrical Journal 42 17–25.
  • Cao, H. (2007). Moderate deviations for two sample t-statistics., ESAIM: Probability and Statistics 11 264–271.
  • Cox, D.R. and Hinkley, D.V. (1974)., Theoretical Statistics Chapman and Hall: London.
  • Dowdy, S., Wearden, S., and Chilko, D. (2004)., Statistics for Research, third edition Wiley: New York.
  • Dudewicz, E.J. and Mishra, S.N. (1988)., Modern Mathematical Statistics Wiley: New York.
  • Durrett, R. (2002)., Probability models for DNA sequence evolution. Springer: New York.
  • Edgington,E.S. (1995)., Randomization Tests, third edition. Marcel Dekker, Inc.: New York.
  • Ewens, W.J. (2004)., Mathematical Population Genetics. I. Theoretical Introduction, second edition Springer: New York.
  • Fay, M.P. (1999). Comparing several score tests for interval censored data., Statistics in Medicine 18 273–285 (Correction: 1999; 2681).
  • Finkelstein, D.M. (1986). A proportional hazards model for interval-censored failure time data., Biometrics 42 845–854.
  • Hájek, J. and Šidák, Z. (1967)., Theory of Rank Tests Academic Press: New York.
  • Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986)., Robust Statistics: The Approach Based on Influence Functions Wiley: New York.
  • He, X., Simpson, D.G., and Portnoy, S.L. (1990). Breakdown robustness of tests., Journal of the American Statistical Association 85 446–452.
  • Hennekens, C.H., Eberlein, K.A., for the Physicians’ Health Study Research Group. (1985). A randomized trial of aspirin and, β-carotene among U.S. physicians. Preventive Medicine 14 165–168.
  • Hettmansperger, T.P. (1984)., Statistical Inference Based on Ranks. Krieger Publishing Company: Malabar, Florida.
  • Hodges, J.L. and Lehmann, E.L. (1963). Estimates of Location Based on Rank Tests., Annals of Mathematical Statistics 34 598–611.
  • Huber, P.J. (1965). A robust version of the probability ratio test., Annals of Mathematical Statistics 36 1753–1758.
  • Huber, P.J., and Ronchetti, E.M. (2009)., Robust Statistics, second edition Wiley: New York.
  • Janssen, A. (1997). Studentized permutation tests for non-i.i.d. hypotheses on the generalized Behrens-Fisher problem., Statistics and Probability Letters 36 9–21.
  • Janssen, A. (1999). Testing nonparametric statistical functionals with applications to rank tests., Journal of Statistical Planning and Inference 81 71–93.
  • Jureckova, J. and Sen, P.K. (1996)., Robust Statistical Procedures: Asymptotics and Interrelations Wiley: New York.
  • Kempthorne, O. and Doerfler, T.E. (1969). The behaviour of some significance tests under experimental randomization., Biometrika 56 231–248.
  • Lehmann, E.L. (1951). Consistency and unbiasedness of certain nonparametric tests., Annals of Mathematical Statistics 22 165–179.
  • Lehmann, E.L. (1975)., Nonparametrics: Statistical Methods Based on Ranks Holden-Day, Inc.: Oakland, CA.
  • Lehmann, E.L. (1997). Review of, Error and the Growth of Experimental Knowledge by D.G. Mayo. Journal of the American Statistical Association 92 789.
  • Lehmann, E.L. (1999)., Elements of Large-Sample Theory Springer: New York.
  • Lehmann, E.L. and Romano, J.P. (2005)., Testing Statistical Hypotheses, third edition Springer, New York.
  • Ludbrook, J. and Dudley, H. (1998). Why permutation tests are superior to, t and F tests in biomedical research. American Statistician 52 127–132.
  • Mallows, C.L. (2000). Letter to the Editor in Response to Ludbrook and Dudley(1998., 54 86–87.
  • Mann, H.B. and Whitney, D.R. (1947). On a Test of Whether One of Two Random Variables is Stochastically Larger Than the Other., Annals of Mathematical Statistics 18 50–60.
  • Mayo, D.G. (1996)., Error and the Growth of Experimental Knowledge University of Chicago Press, Chicago.
  • Mayo, D.G. (2003). Comment on Could Fisher, Jefferys and Neyman Have Agreed on Testing. by J.O. Berger., Statistical Science 18 19–24.
  • Mayo, D.G. and Spanos, A. (2004). Methodology in Practice: Statistical Misspecification Testing., Philosophy of Science 71 1007–1025.
  • Mayo, D.G. and Spanos, A. (2006). Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction., 57 323–357.
  • McCullagh, P. (1980). Regression models for ordinal data. (with discussion), Journal of the Royal Statistical Society, series B 42 109–142.
  • McDermott, M.P. and Wang, Y. (1999). Comment on “The emperor’s new tests” by Perlman and Wu, Statistical Science 14 374–377.
  • Mee, R.W. (1990). Confidence intervals for probabilities and tolerance regions based on a generalization of the Mann-Whitney statistic., Journal of the American Statistical Association. 85 793–800.
  • Mehta, C.R., Patel, N.R., and Tsiatis, A.A. (1984). Exact significance testing to establish treatment equivalence with ordered categorical data., Biometrics 40 819–825.
  • Moser, B.K., Stevens, G.R., and Watts, C.L. (1989). The two-sample t test versus Satterhwaite’s approximate F test., Communications in Statistics: Theory and Methods 18 3963–3975.
  • Neubert, K. and Brunner, E. (2007). A Studentized permutation test for the non-parametric Behrens-Fisher problem., Computational Statistics and Data Analysis 51 5192–5204.
  • Perlman, M. and Wu, L. (1999). The emperor’s new tests (with discussion)., Statistical Science 14, 355–381.
  • Pratt, J.W. (1964). Robustness of some procedures for the two-sample location problem., Journal of the American Statistical Association 59 665–680.
  • Putter, J. (1955). The treatment of ties in some nonparametric tests., Annals of Mathematical Statistics 26 368–386.
  • Proschan, M. and Follmann, D. (2008). Cluster without fluster: the effect of correlated outcomes on inference in randomized clinical trials., Statistics in Medicine DOI 10.1002/sim.2977.
  • Sawilowsky, S.S. and Blair, R.C. (1992). A more realistic look at the robustness and type II error properties of the t test to departures from population normality., Psychological Bulletin 111 (2) 352–360.
  • Sen, P.K. (1967). A note on asymptotically distribution-free confidence bounds for, PrY<X based on two independent samples. Sankhya, Ser. A, 29 (Pt. 1) 95–102.
  • Simonsen, K.L., Churchill, G.A., and Aquadro, C.F. (1995). Properties of statistical tests of neutrality for DNA polymorphism data., Genetics 141 413–429.
  • Sterring Committee of the Physicians’ Health Study Research Group (1988). Preliminary Report: Findings from the Aspirin Component of the Ongoing Physicians’ Health Study., New England Journal of Medicine 318 262–264.
  • Sterring Committee of the Physicians’ Health Study Research Group (1989). Final Report on the Aspirin Component of the Ongoing Physicians’ Health Study., New England Journal of Medicine 321 129–135.
  • Sun, J. (1996). A non-parametric test for interval-censored failure time data with applications to AIDS studies., Statistics in Medicine, 15, 1387–1395.
  • Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism., Genetics 123 585–595.
  • van der Vaart, A.W. (1998)., Asymptotic Statistics Cambridge University Press, Cambridge.
  • Wilcoxon, F. (1945). Individual comparisons by Ranking Methods., Biometrics Bulletin 1 80–83.
  • Winter, P.C., Hickey, G.I., and Fletcher, H.L. (2002)., Instant Notes: Genetics, second edition. Bios Scientific Publishers: Oxford.
  • Whitt, W. (1988). Stochastic Ordering. in, Encyclopedia of Statistics Vol. 8, S. Kotz and N.L. Johnson (editors). Wiley: New York.