Statistical Science

The Emperor’s new tests

Michael D. Perlman and Lang Wu

Full-text: Open access

Abstract

In the past two decades, striking examples of allegedly inferior likelihood ratio tests (LRT) have appeared in the statistical literature. These examples, which arise in multiparameter hypothesis testing problems, have several common features. In each case the null hypothesis is composite, the size LRT is not similar and hence biased, and competing size tests can be constructed that are less biased, or even unbiased, and that dominate the LRT in the sense of being everywhere more powerful. It is therefore asserted that in these examples and, by implication, many other testing problems, the LR criterion produces ‘‘inferior,’’ ‘‘deficient,’’ ‘‘ undesirable,’’ or ‘‘flawed’’ statistical procedures.

This message, which appears to be proliferating, is wrong. In each example it is the allegedly superior test that is flawed, not the LRT. At worst, the ‘‘superior’’ tests provide unwarranted and inappropriate inferences and have been deemed scientifically unacceptable by applied statisticians. This reinforces the well-documented but oft-neglected fact that the Neyman-Pearson theory desideratum of a more or most powerful size test may be scientifically inappropriate; the same is true for the criteria of unbiasedness and -admissibility. Although the LR criterion is not infallible, we believe that it remains a generally reasonable first option for non-Bayesian parametric hypothesis-testing problems.

Article information

Source
Statist. Sci., Volume 14, Number 4 (1999), 355-369.

Dates
First available in Project Euclid: 24 December 2001

Permanent link to this document
https://projecteuclid.org/euclid.ss/1009212517

Digital Object Identifier
doi:10.1214/ss/1009212517

Mathematical Reviews number (MathSciNet)
MR1765215

Zentralblatt MATH identifier
1059.62515

Keywords
Hypothesis test significance test likelihood ratio test power size test unbiased test a-admissibility d-admissibility order-restricted hypotheses multiple endpoints in clinical trials test for qualitative interactions bioequivalence problem multivariate one-sided alternatives Fisher-Neyman debate

Citation

Perlman, Michael D.; Wu, Lang. The Emperor’s new tests. Statist. Sci. 14 (1999), no. 4, 355--369. doi:10.1214/ss/1009212517. https://projecteuclid.org/euclid.ss/1009212517


Export citation

References

  • AITKEN, M. 1991. Posterior Bayes factors with discussion. J. Roy. Statist. Soc. Ser. B 53 111 142. Z.
  • ANDERSON, S. and HAUCK, W. W. 1983. A new procedure for testing equivalence in comparative bioavailability and other clinical trials. Comm. Statist. Theory Methods 12 2662 2692. Z.
  • BAHADUR, R. R. 1967. An optimal property of the likelihood ratio statistic. Proc. Fifth Berkeley Symp. Math. Statist. Probab. 1 13 26. Univ. California Press, Berkeley. Z. Z
  • BASU, D. 1975. Statistical information and likelihood with. discussion. Sankhya Ser. A 37 1 71. Z.
  • BERGER, J. O. and WOLPERT, R. L. 1988. The Likelihood Principle, 2nd ed. IMS, Hayward, CA. Z.
  • BERGER, R. L. 1989. Uniformly more powerful tests for hypotheses concerning linear inequalities and normal means. J. Amer. Statist. Assoc. 84 192 199. Z.
  • BERGER, R. L. and HSU, J. C. 1996. Bioequivalence trials, intersection-union tests and equivalence confidence sets Z. with discussion. Statist. Sci. 11 283 319. Z.
  • BERGER, R. L. and SINCLAIR, D. 1984. Testing hypotheses concerning unions of linear subspaces. J. Amer. Statist. Assoc. 79 158 163. Z.
  • BROWN, L. D. 1990. An ancillarity paradox which appears in Z. multiple linear regression with discussion. Ann. Statist. 18 471 538. Z.
  • BROWN, L. D., HWANG, J. T. G. and MUNK, A. 1997. An unbiased test for the bioequivalence problem. Ann. Statist. 25 2345 2367. Z.
  • COHEN, A., GATSONIS, C. and MARDEN, J. I. 1983. Hypothesis tests and optimality properties in discrete multivariate analysis. In Studies in Econometrics, Time Series, and MulZ tivariate Statistics S. Karzin, T. Amemiya and L. A. Good. man, eds. 379 405. Academic Press, New York. Z.
  • COHEN, A., KEMPERMAN, J. H. B. and SACKROWITZ, H. B. 1997. A critique of likelihood inference for order restricted models. Technical Report 97-010, Dept. Statistics, Rutgers Univ. Z.
  • COHEN, A. and SACKROWITZ, H. B. 1998. Directional tests for one-sided alternatives in multivariate models. Ann. Statist. 26 2321 2338. Z.
  • CORNFIELD, J. 1969. The Bayesian outlook and its applications Z. with discussion. Biometrics 25 617 657.
  • COX, D. R. and HINKLEY, D. V. 1974. Theoretical Statistics. Chapman and Hall, London. Z.
  • DAWID, A. P. 1991. Fisherian inference in likelihood and preZ. quential frames of reference with discussion. J. Roy. Statist. Soc. Ser. B 53 79 109. Z.
  • DEMPSTER, A. P. 1997. The direct use of likelihood for signifiZ cance testing. Statist. Comput. 7 247 252. Originally pub. lished in 1973. Z.
  • EATON, M. L. 1989. Group Invariance Applications in Statistics. Regional Conference Series in Probability and Statistics 1.
  • IMS, Hayward, CA. Z.
  • EDWARDS, A. W. F. 1972. Likelihood. Cambridge Univ. Press. Z.
  • GUTMANN, S. 1987. Tests uniformly more powerful than uniformly most powerful monotone tests. J. Statist. Plann. Inference 17 279 292. Z.
  • HACKING, I. 1965. Logic of Statistical Inference. Cambridge Univ. Press. Z.
  • HOEFFDING, W. and WOLFOWITZ, J. 1958. Distinguishability of sets of distributions. Ann. Math. Statist. 29 700 718. Z.
  • KIEFER, J. 1977. Conditional confidence statements and confiZ. dence estimators with discussion. J. Amer. Statist. Assoc. 72 789 827. Z.
  • LASKA, E. M. and MEISNER, M. J. 1989. Testing whether an identified treatment is best. Biometrics 45 1139 1151. Z.
  • LASKA, E. M., TANG, D.-I. and MEISNER, M. J. 1992. Testing hypotheses about an identified treatment when there are multiple endpoints. J. Amer. Statist. Assoc. 87 825 831. Z.
  • LEHMANN, E. L. 1950. Some principles of the theory of testing hypotheses. Ann. Math. Statist. 21 1 26. Z.
  • LEHMANN, E. L. 1952. Testing multiparameter hypotheses. Ann. Math. Statist. 23 541 562. Z.
  • LEHMANN, E. L. 1986. Testing Statistical Hypotheses. Wiley, New York. Z.
  • LEHMANN, E. L. 1993. The Fisher, Neyman Pearson theories of testing hypotheses: one theory or two? J. Amer. Statist. Assoc. 88 1242 1249. Z.
  • LEHMANN, E. L. 1998. Letter to M. D. Perlman, 8 November 1998. Z.
  • LIU, H. and BERGER, R. L., 1995. Uniformly more powerful, one-sided tests for hypotheses about linear inequalities. Ann. Statist. 23 55 72. Z.
  • MARDEN, J. I. and PERLMAN, M. D. 1980. Invariant tests for means with covariates. Ann. Statist. 8 25 63. Z.
  • MCDERMOTT, M. P. and WANG, Y. 2000. Construction of uniformly more powerful tests for hypotheses about linear inequalities. J. Statist. Plann. Inference. To appear. Z.
  • MENENDEZ, J. A., RUEDA, C. and SALVADOR, B. 1992. Dominance of likelihood ratio tests under cone constraints. Ann. Statist. 20 2087 2099. Z.
  • MENENDEZ, J. A. and SALVADOR, B. 1991. Anomalies of the likelihood ratio test for testing restricted hypotheses. Ann. Statist. 19 889 898. Z.
  • MUKERJEE, H. and TU, R. 1995. Order-restricted inferences in linear regression. J. Amer. Statist. Assoc. 90 717 728. Z.
  • MUNK, A. 1999. A note on unbiased testing for the equivalence problem. Statist. Probab. Lett. 41 401 406. Z.
  • NEYMAN, J. and PEARSON, E. S. 1928. On the use and interpretation of certain test criteria for purposes of statistical inference I, II. Biometrika 20A 175 240, 263 294. Z.
  • NEYMAN, J. and PEARSON, E. S. 1933. On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. London Ser. A 231 289 337. Z.
  • NOMAKUCHI, K. and SAKATA, T. 1987. A note on testing twodimensional normal mean. Ann. Inst. Statist. Math. 39 489 495. Z.
  • PATEL, H. I. and GUPTA, G. D. 1984. A problem of equivalence in clinical trials. Biometrical J. 26 471 474. Z.
  • PERLMAN, M. D. 1969. One-sided testing problems in multivariZ ate analysis. Ann. Math. Statist. 40 549 567 Correction:. Ann. Math. Statist. 41 1777. Z.
  • PERLMAN, M. D. and WU, L. 2000a. A class of conditional tests for multivariate one-sided alternatives. J. Statist. Plann. Inference. To appear. Z.
  • PERLMAN, M. D. and WU, L. 2000b. A defense of the likelihood ratio test for one-sided and order-restricted alternatives. J. Statist. Plann. Inference. To appear. Z.
  • POCOCK, S., GELLER, N. L. and TSIATIS, A. A. 1987. The analysis of multiple endpoints in clinical trials. Biometrics 43 465 472. Z.
  • PRATT, J. W. 1961. Review of Testing Statistical Hypotheses Z. 1959 by E. L. Lehmann. J. Amer. Statist. Assoc. 56 153 156. Z.
  • ROBERTSON, T., WRIGHT, F. T. and DYKSTRA, R. L. 1988. OrderRestricted Statistical Inference. Wiley, New York. Z.
  • ROCKE, D. M. 1984. On testing for bioequivalence. Biometrics 40 225 230. Z.
  • ROYALL, R. M. 1997. Statistical Evidence: A Likelihood Paradigm. Chapman and Hall, London. Z. RUSSEK-COHEN, E. and SIMON, R. 1993. Qualitative interactions in multifactor studies. Biometrics 49 467 477. Z.
  • SASABUCHI, S. 1980. A test of a multivariate normal mean with composite hypotheses determined by linear inequalities. Biometrika 67 429 439. Z.
  • SOLOMON, D. L. 1975. A note on the non-equivalence of the Neyman Pearson and generalized likelihood ratio tests for testing a simple null versus a simple alternative hypothesis. Amer. Statist. 29 101 102. Z.
  • TANG, D.-I. 1994. Uniformly more powerful tests in a onesided multivariate problem. J. Amer. Statist. Assoc. 89 1006 1011. Z.
  • TANG, D.-I. 1998. Testing the hypothesis of a normal mean lying outside a convex cone. Comm. Statist. Theory Methods 27 1517 1534. Z.
  • TANG, D.-I., GELLER, N. L. and POCOCK, S. J. 1993. On the design and analysis of randomized clinical trials with multiple endpoints. Biometrics 49 23 30. Z.
  • WALD, A. 1941a. Asymptotically most powerful tests of statistical hypotheses. Ann. Math. Statist. 12 1 19. Z.
  • WALD, A. 1941b. Some examples of asymptotically most powerful tests. Ann. Math. Statist. 12 396 408. Z.
  • WALD, A. 1943. Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Amer. Math. Soc. 54 426 482. Z.
  • WANG, W. 1997. Optimal unbiased tests for equivalence intrasubject variability. J. Amer. Statist. Assoc. 92 1163 1170. Z.
  • WANG, W., HWANG, J. T. G. and DASGUPTA, A. 1999. Statistical tests for multivariate bioequivalence. Biometrika 86 395 402. Z.
  • WANG, Y. and MCDERMOTT, M. P. 1998a. Conditional likelihood ratio test for a nonnegative normal mean vector. J. Amer. Statist. Assoc. 93 380 386. Z.
  • WANG, Y. and MCDERMOTT, M. P. 1998b. A conditional test for a nonnegative mean vector based on a Hotelling's T 2-type statistic. J. Multivariate Anal. 66 64 70. Z.
  • WARRACK, G. and ROBERTSON, T. 1984. A likelihood ratio test regarding two nested but oblique order-restricted hypotheses. J. Amer. Statist. Assoc. 79 881 886. Z.
  • WILKS, S. S. 1938. The large-sample distribution of the likelihood ratio for testing composite hypotheses. Ann. Math. Statist. 9 60 62. Z.
  • WILKS, S. S. 1962. Mathematical Statistics. Wiley, New York. Z.
  • ZELTERMAN, D. 1990. On tests for qualitative interactions. Statist. Probab. Lett. 10 59 63.