Statistics Surveys
- Statist. Surv.
- Volume 4 (2010), 1-39.
Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules
Michael P. Fay and Michael A. Proschan
Full-text: Open access
Abstract
In a mathematical approach to hypothesis tests, we start with a clearly defined set of hypotheses and choose the test with the best properties for those hypotheses. In practice, we often start with less precise hypotheses. For example, often a researcher wants to know which of two groups generally has the larger responses, and either a t-test or a Wilcoxon-Mann-Whitney (WMW) test could be acceptable. Although both t-tests and WMW tests are usually associated with quite different hypotheses, the decision rule and p-value from either test could be associated with many different sets of assumptions, which we call perspectives. It is useful to have many of the different perspectives to which a decision rule may be applied collected in one place, since each perspective allows a different interpretation of the associated p-value. Here we collect many such perspectives for the two-sample t-test, the WMW test and other related tests. We discuss validity and consistency under each perspective and discuss recommendations between the tests in light of these many different perspectives. Finally, we briefly discuss a decision rule for testing genetic neutrality where knowledge of the many perspectives is vital to the proper interpretation of the decision rule.
Article information
Source
Statist. Surv., Volume 4 (2010), 1-39.
Dates
First available in Project Euclid: 22 February 2010
Permanent link to this document
https://projecteuclid.org/euclid.ssu/1266847666
Digital Object Identifier
doi:10.1214/09-SS051
Mathematical Reviews number (MathSciNet)
MR2595125
Zentralblatt MATH identifier
1188.62154
Keywords
Behrens-Fisher problem interval censored data nonparametric Behrens-Fisher problem Tajima’s D t-test Wilcoxon rank sum test
Citation
Fay, Michael P.; Proschan, Michael A. Wilcoxon-Mann-Whitney or t-test? On assumptions for hypothesis tests and multiple interpretations of decision rules. Statist. Surv. 4 (2010), 1--39. doi:10.1214/09-SS051. https://projecteuclid.org/euclid.ssu/1266847666
References
- Andersen, P.K., Borgan, O, Gill, R.D., and Keiding, N. (1993)., Statistical Models Based on Counting Processes Springer-Verlag: New York.Mathematical Reviews (MathSciNet): MR1198884
- Berger, R.L. and Boos, D.D. (1994). P values maximized over a confidence set for the nuisance parameter., Journal of the American Statistical Association 89 1012–1016.Mathematical Reviews (MathSciNet): MR1294746
Zentralblatt MATH: 0804.62018
Digital Object Identifier: doi:10.2307/2290928
JSTOR: links.jstor.org - Blair, R. C. and Higgins, J.J. (1980). A comparison of the power of Wilcoxon’s rank-sum statistic to that of Student’s t statistic under various nonnormal distributions., Journal of Educational Statistics 5 309–334.
- Box, G.E.P., Hunter, J.S., and Hunter, W.G. (2005)., Statistics for Experimenters: Design, Innovation, and Discovery, second edition. Wiley: New York.Mathematical Reviews (MathSciNet): MR2140250
- Brunner, E. and Munzel, U. (2000). The Nonparametric Behrens-Fisher Problem: Asymptotic Theory and a Small-Sample Approximation., Biometrical Journal 42 17–25.Mathematical Reviews (MathSciNet): MR1744561
Digital Object Identifier: doi:10.1002/(SICI)1521-4036(200001)42:1<17::AID-BIMJ17>3.0.CO;2-U - Cao, H. (2007). Moderate deviations for two sample t-statistics., ESAIM: Probability and Statistics 11 264–271.Mathematical Reviews (MathSciNet): MR2320820
Zentralblatt MATH: 1181.60037
Digital Object Identifier: doi:10.1051/ps:2007020 - Cox, D.R. and Hinkley, D.V. (1974)., Theoretical Statistics Chapman and Hall: London.Mathematical Reviews (MathSciNet): MR370837
- Dowdy, S., Wearden, S., and Chilko, D. (2004)., Statistics for Research, third edition Wiley: New York.
- Dudewicz, E.J. and Mishra, S.N. (1988)., Modern Mathematical Statistics Wiley: New York.
- Durrett, R. (2002)., Probability models for DNA sequence evolution. Springer: New York.Mathematical Reviews (MathSciNet): MR1903526
- Edgington,E.S. (1995)., Randomization Tests, third edition. Marcel Dekker, Inc.: New York.
- Ewens, W.J. (2004)., Mathematical Population Genetics. I. Theoretical Introduction, second edition Springer: New York.
- Fay, M.P. (1999). Comparing several score tests for interval censored data., Statistics in Medicine 18 273–285 (Correction: 1999; 2681).
- Finkelstein, D.M. (1986). A proportional hazards model for interval-censored failure time data., Biometrics 42 845–854.Mathematical Reviews (MathSciNet): MR872963
Digital Object Identifier: doi:10.2307/2530698
JSTOR: links.jstor.org - Hájek, J. and Šidák, Z. (1967)., Theory of Rank Tests Academic Press: New York.
- Hampel, F.R., Ronchetti, E.M., Rousseeuw, P.J., and Stahel, W.A. (1986)., Robust Statistics: The Approach Based on Influence Functions Wiley: New York.Mathematical Reviews (MathSciNet): MR829458
- He, X., Simpson, D.G., and Portnoy, S.L. (1990). Breakdown robustness of tests., Journal of the American Statistical Association 85 446–452.Mathematical Reviews (MathSciNet): MR1141746
Zentralblatt MATH: 0713.62039
Digital Object Identifier: doi:10.2307/2289782
JSTOR: links.jstor.org - Hennekens, C.H., Eberlein, K.A., for the Physicians’ Health Study Research Group. (1985). A randomized trial of aspirin and, β-carotene among U.S. physicians. Preventive Medicine 14 165–168.
- Hettmansperger, T.P. (1984)., Statistical Inference Based on Ranks. Krieger Publishing Company: Malabar, Florida.Mathematical Reviews (MathSciNet): MR758442
- Hodges, J.L. and Lehmann, E.L. (1963). Estimates of Location Based on Rank Tests., Annals of Mathematical Statistics 34 598–611.Mathematical Reviews (MathSciNet): MR152070
Zentralblatt MATH: 0203.21105
Digital Object Identifier: doi:10.1214/aoms/1177704172
Project Euclid: euclid.aoms/1177704172 - Huber, P.J. (1965). A robust version of the probability ratio test., Annals of Mathematical Statistics 36 1753–1758.Mathematical Reviews (MathSciNet): MR185747
Zentralblatt MATH: 0137.12702
Digital Object Identifier: doi:10.1214/aoms/1177699803
Project Euclid: euclid.aoms/1177699803 - Huber, P.J., and Ronchetti, E.M. (2009)., Robust Statistics, second edition Wiley: New York.Mathematical Reviews (MathSciNet): MR2488795
- Janssen, A. (1997). Studentized permutation tests for non-i.i.d. hypotheses on the generalized Behrens-Fisher problem., Statistics and Probability Letters 36 9–21.Mathematical Reviews (MathSciNet): MR1491070
- Janssen, A. (1999). Testing nonparametric statistical functionals with applications to rank tests., Journal of Statistical Planning and Inference 81 71–93.Mathematical Reviews (MathSciNet): MR1718393
Zentralblatt MATH: 0951.62037
Digital Object Identifier: doi:10.1016/S0378-3758(99)00009-9 - Jureckova, J. and Sen, P.K. (1996)., Robust Statistical Procedures: Asymptotics and Interrelations Wiley: New York.
- Kempthorne, O. and Doerfler, T.E. (1969). The behaviour of some significance tests under experimental randomization., Biometrika 56 231–248.Mathematical Reviews (MathSciNet): MR254965
Zentralblatt MATH: 0175.17003
Digital Object Identifier: doi:10.1093/biomet/56.2.231
JSTOR: links.jstor.org - Lehmann, E.L. (1951). Consistency and unbiasedness of certain nonparametric tests., Annals of Mathematical Statistics 22 165–179.Mathematical Reviews (MathSciNet): MR40632
Zentralblatt MATH: 0045.40903
Digital Object Identifier: doi:10.1214/aoms/1177729639
Project Euclid: euclid.aoms/1177729639 - Lehmann, E.L. (1975)., Nonparametrics: Statistical Methods Based on Ranks Holden-Day, Inc.: Oakland, CA.Mathematical Reviews (MathSciNet): MR395032
- Lehmann, E.L. (1997). Review of, Error and the Growth of Experimental Knowledge by D.G. Mayo. Journal of the American Statistical Association 92 789.
- Lehmann, E.L. (1999)., Elements of Large-Sample Theory Springer: New York.
- Lehmann, E.L. and Romano, J.P. (2005)., Testing Statistical Hypotheses, third edition Springer, New York.Mathematical Reviews (MathSciNet): MR2135927
- Ludbrook, J. and Dudley, H. (1998). Why permutation tests are superior to, t and F tests in biomedical research. American Statistician 52 127–132.
- Mallows, C.L. (2000). Letter to the Editor in Response to Ludbrook and Dudley(1998., 54 86–87.
- Mann, H.B. and Whitney, D.R. (1947). On a Test of Whether One of Two Random Variables is Stochastically Larger Than the Other., Annals of Mathematical Statistics 18 50–60.Mathematical Reviews (MathSciNet): MR22058
Zentralblatt MATH: 0041.26103
Digital Object Identifier: doi:10.1214/aoms/1177730491
Project Euclid: euclid.aoms/1177730491 - Mayo, D.G. (1996)., Error and the Growth of Experimental Knowledge University of Chicago Press, Chicago.
- Mayo, D.G. (2003). Comment on Could Fisher, Jefferys and Neyman Have Agreed on Testing. by J.O. Berger., Statistical Science 18 19–24.Mathematical Reviews (MathSciNet): MR1997064
Digital Object Identifier: doi:10.1214/ss/1056397485
Project Euclid: euclid.ss/1056397485 - Mayo, D.G. and Spanos, A. (2004). Methodology in Practice: Statistical Misspecification Testing., Philosophy of Science 71 1007–1025.Mathematical Reviews (MathSciNet): MR2133711
Digital Object Identifier: doi:10.1086/425064
JSTOR: links.jstor.org - Mayo, D.G. and Spanos, A. (2006). Severe Testing as a Basic Concept in a Neyman-Pearson Philosophy of Induction., 57 323–357.Mathematical Reviews (MathSciNet): MR2249183
Zentralblatt MATH: 1098.03030
Digital Object Identifier: doi:10.1093/bjps/axl003 - McCullagh, P. (1980). Regression models for ordinal data. (with discussion), Journal of the Royal Statistical Society, series B 42 109–142.
- McDermott, M.P. and Wang, Y. (1999). Comment on “The emperor’s new tests” by Perlman and Wu, Statistical Science 14 374–377.
- Mee, R.W. (1990). Confidence intervals for probabilities and tolerance regions based on a generalization of the Mann-Whitney statistic., Journal of the American Statistical Association. 85 793–800.Mathematical Reviews (MathSciNet): MR1138359
Digital Object Identifier: doi:10.2307/2290017
JSTOR: links.jstor.org - Mehta, C.R., Patel, N.R., and Tsiatis, A.A. (1984). Exact significance testing to establish treatment equivalence with ordered categorical data., Biometrics 40 819–825.Mathematical Reviews (MathSciNet): MR775388
Digital Object Identifier: doi:10.2307/2530927
JSTOR: links.jstor.org - Moser, B.K., Stevens, G.R., and Watts, C.L. (1989). The two-sample t test versus Satterhwaite’s approximate F test., Communications in Statistics: Theory and Methods 18 3963–3975. Mathematical Reviews (MathSciNet): MR1058922
Zentralblatt MATH: 0696.62075
Digital Object Identifier: doi:10.1080/03610928908830135 - Neubert, K. and Brunner, E. (2007). A Studentized permutation test for the non-parametric Behrens-Fisher problem., Computational Statistics and Data Analysis 51 5192–5204.Mathematical Reviews (MathSciNet): MR2370717
- Perlman, M. and Wu, L. (1999). The emperor’s new tests (with discussion)., Statistical Science 14, 355–381.
- Pratt, J.W. (1964). Robustness of some procedures for the two-sample location problem., Journal of the American Statistical Association 59 665–680.Mathematical Reviews (MathSciNet): MR166871
Digital Object Identifier: doi:10.2307/2283092
JSTOR: links.jstor.org - Putter, J. (1955). The treatment of ties in some nonparametric tests., Annals of Mathematical Statistics 26 368–386.Mathematical Reviews (MathSciNet): MR70923
Zentralblatt MATH: 0065.12302
Digital Object Identifier: doi:10.1214/aoms/1177728485
Project Euclid: euclid.aoms/1177728485 - Proschan, M. and Follmann, D. (2008). Cluster without fluster: the effect of correlated outcomes on inference in randomized clinical trials., Statistics in Medicine DOI 10.1002/sim.2977.
- Sawilowsky, S.S. and Blair, R.C. (1992). A more realistic look at the robustness and type II error properties of the t test to departures from population normality., Psychological Bulletin 111 (2) 352–360.
- Sen, P.K. (1967). A note on asymptotically distribution-free confidence bounds for, PrY<X based on two independent samples. Sankhya, Ser. A, 29 (Pt. 1) 95–102.Mathematical Reviews (MathSciNet): MR226772
- Simonsen, K.L., Churchill, G.A., and Aquadro, C.F. (1995). Properties of statistical tests of neutrality for DNA polymorphism data., Genetics 141 413–429.
- Sterring Committee of the Physicians’ Health Study Research Group (1988). Preliminary Report: Findings from the Aspirin Component of the Ongoing Physicians’ Health Study., New England Journal of Medicine 318 262–264.
- Sterring Committee of the Physicians’ Health Study Research Group (1989). Final Report on the Aspirin Component of the Ongoing Physicians’ Health Study., New England Journal of Medicine 321 129–135.
- Sun, J. (1996). A non-parametric test for interval-censored failure time data with applications to AIDS studies., Statistics in Medicine, 15, 1387–1395.
- Tajima, F. (1989). Statistical method for testing the neutral mutation hypothesis by DNA polymorphism., Genetics 123 585–595.
- van der Vaart, A.W. (1998)., Asymptotic Statistics Cambridge University Press, Cambridge.
- Wilcoxon, F. (1945). Individual comparisons by Ranking Methods., Biometrics Bulletin 1 80–83.
- Winter, P.C., Hickey, G.I., and Fletcher, H.L. (2002)., Instant Notes: Genetics, second edition. Bios Scientific Publishers: Oxford.
- Whitt, W. (1988). Stochastic Ordering. in, Encyclopedia of Statistics Vol. 8, S. Kotz and N.L. Johnson (editors). Wiley: New York.
The American Statistical Association, the Bernoulli Society, the Institute of Mathematical Statistics, and the Statistical Society of Canada

- You have access to this content.
- You have partial access to this content.
- You do not have access to this content.
More like this
- Ranked Set Sampling: An Approach to More Efficient Data Collection
Wolfe, Douglas A., Statistical Science, 2004 - Non-marginal decisions: A novel Bayesian multiple testing procedure
Chandra, Noirrit Kiran and Bhattacharya, Sourabh, Electronic Journal of Statistics, 2019 - Testing one Simple Hypothesis Against Another
Weiss, Lionel, Annals of Mathematical Statistics, 1953
- Ranked Set Sampling: An Approach to More Efficient Data Collection
Wolfe, Douglas A., Statistical Science, 2004 - Non-marginal decisions: A novel Bayesian multiple testing procedure
Chandra, Noirrit Kiran and Bhattacharya, Sourabh, Electronic Journal of Statistics, 2019 - Testing one Simple Hypothesis Against Another
Weiss, Lionel, Annals of Mathematical Statistics, 1953 - Sequential Estimation and Closed Sequential Decision Procedures
Paulson, Edward, Annals of Mathematical Statistics, 1964 - On a $c$-Sample Test Based on Trimmed Samples
Tamura, Ryoji, Annals of Mathematical Statistics, 1971 - On the Efficiency of Two-sample Mann-Whitney Test for Discrete Populations
Chanda, K. C., Annals of Mathematical Statistics, 1963 - Simultaneous Test Procedures--Some Theory of Multiple Comparisons
Gabriel, K. R., Annals of Mathematical Statistics, 1969 - Hypothesis testing by convex optimization
Goldenshluger, Alexander, Juditsky, Anatoli, and Nemirovski, Arkadi, Electronic Journal of Statistics, 2015 - Truncated Life Tests in the Exponential Case
Epstein, Benjamin, Annals of Mathematical Statistics, 1954 - On the Design and Comparison of Certain Dichotomous Experiments
Bradt, Russell N. and Karlin, Samuel, Annals of Mathematical Statistics, 1956
