The Emperor’s new tests

Michael D. Perlman; Lang Wu

doi:10.1214/ss/1009212517

November 1999 The Emperor’s new tests

Michael D. Perlman, Lang Wu

Statist. Sci. 14(4): 355-369 (November 1999). DOI: 10.1214/ss/1009212517

Abstract

In the past two decades, striking examples of allegedly inferior likelihood ratio tests (LRT) have appeared in the statistical literature. These examples, which arise in multiparameter hypothesis testing problems, have several common features. In each case the null hypothesis is composite, the size LRT is not similar and hence biased, and competing size tests can be constructed that are less biased, or even unbiased, and that dominate the LRT in the sense of being everywhere more powerful. It is therefore asserted that in these examples and, by implication, many other testing problems, the LR criterion produces ‘‘inferior,’’ ‘‘deficient,’’ ‘‘ undesirable,’’ or ‘‘flawed’’ statistical procedures.

This message, which appears to be proliferating, is wrong. In each example it is the allegedly superior test that is flawed, not the LRT. At worst, the ‘‘superior’’ tests provide unwarranted and inappropriate inferences and have been deemed scientifically unacceptable by applied statisticians. This reinforces the well-documented but oft-neglected fact that the Neyman-Pearson theory desideratum of a more or most powerful size test may be scientifically inappropriate; the same is true for the criteria of unbiasedness and -admissibility. Although the LR criterion is not infallible, we believe that it remains a generally reasonable first option for non-Bayesian parametric hypothesis-testing problems.

Citation

Download Citation

Michael D. Perlman. Lang Wu. "The Emperor’s new tests." Statist. Sci. 14 (4) 355 - 369, November 1999. https://doi.org/10.1214/ss/1009212517

Information

Published: November 1999

First available in Project Euclid: 24 December 2001

zbMATH: 1059.62515

MathSciNet: MR1765215

Digital Object Identifier: 10.1214/ss/1009212517

Keywords: a-admissibility , bioequivalence problem , d-admissibility , Fisher-Neyman debate , hypothesis test , likelihood ratio test , multiple endpoints in clinical trials , multivariate one-sided alternatives , order-restricted hypotheses , power , significance test , size test , test for qualitative interactions , unbiased test

Access the abstract

JOURNAL ARTICLE
15 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY