Statistical Science

Could Fisher, Jeffreys and Neyman Have Agreed on Testing?

James O. Berger

Full-text: Open access

Abstract

Ronald Fisher advocated testing using p-values, Harold Jeffreys proposed use of objective posterior probabilities of hypotheses and Jerzy Neyman recommended testing with fixed error probabilities. Each was quite critical of the other approaches. Most troubling for statistics and science is that the three approaches can lead to quite different practical conclusions.

This article focuses on discussion of the conditional frequentist approach to testing, which is argued to provide the basis for a methodological unification of the approaches of Fisher, Jeffreys and Neyman. The idea is to follow Fisher in using p-values to define the "strength of evidence" in data and to follow his approach of conditioning on strength of evidence; then follow Neyman by computing Type I and Type II error probabilities, but do so conditional on the strength of evidence in the data. The resulting conditional frequentist error probabilities equal the objective posterior probabilities of the hypotheses advocated by Jeffreys.

Article information

Source
Statist. Sci. Volume 18, Issue 1 (2003), 1-32.

Dates
First available in Project Euclid: 23 June 2003

Permanent link to this document
https://projecteuclid.org/euclid.ss/1056397485

Digital Object Identifier
doi:10.1214/ss/1056397485

Mathematical Reviews number (MathSciNet)
MR1997064

Zentralblatt MATH identifier
1048.62006

Keywords
p-values posterior probabilities of hypotheses Type I and Type II error probabilities conditional testing.

Citation

Berger, James O. Could Fisher, Jeffreys and Neyman Have Agreed on Testing?. Statist. Sci. 18 (2003), no. 1, 1--32. doi:10.1214/ss/1056397485. https://projecteuclid.org/euclid.ss/1056397485


Export citation

References

  • Barnett, V. (1999). Comparative Statistical Inference, 3rd ed. Wiley, New York.
  • Basu, D. (1975). Statistical information and likelihood (with discussion). Sankhyā Ser. A 37 1--71.
  • Basu, D. (1977). On the elimination of nuisance parameters. J. Amer. Statist. Assoc. 72 355--366.
  • Bayarri, M. J. and Berger, J. (2000). $P$-values for composite null models (with discussion). J. Amer. Statist. Assoc. 95 1127--1142, 1157--1170.
  • Berger, J. (1985a). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York.
  • Berger, J. (1985b). The frequentist viewpoint and conditioning. In Proc. Berkeley Conference in Honor of Jack Kiefer and Jerzy Neyman (L. Le Cam and R. Olshen, eds.) 1 15--44. Wadsworth, Belmont, CA.
  • Berger, J. and Berry, D. (1988). Statistical analysis and the illusion of objectivity. American Scientist 76 159--165.
  • Berger, J., Boukai, B. and Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion). Statist. Sci. 12 133--160.
  • Berger, J., Boukai, B. and Wang, Y. (1999). Simultaneous Bayesian--frequentist sequential testing of nested hypotheses. Biometrika 86 79--92.
  • Berger, J., Brown, L. and Wolpert, R. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential simple hypothesis testing. Ann. Statist. 22 1787--1807.
  • Berger, J. and Delampady, M. (1987). Testing precise hypotheses (with discussion). Statist. Sci. 2 317--352.
  • Berger, J. and Guglielmi, A. (2001). Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. J. Amer. Statist. Assoc. 96 174--184.
  • Berger, J. and Mortera, J. (1999). Default Bayes factors for non-nested hypothesis testing. J. Amer. Statist. Assoc. 94 542--554.
  • Berger, J. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of $p$ values and evidence (with discussion). J. Amer. Statist. Assoc. 82 112--139.
  • Berger, J. and Wolpert, R. L. (1988). The Likelihood Principle, 2nd ed. (with discussion). IMS, Hayward, CA.
  • Birnbaum, A. (1961). On the foundations of statistical inference: Binary experiments. Ann. Math. Statist. 32 414--435.
  • Bjørnstad, J. (1996). On the generalization of the likelihood function and the likelihood principle. J. Amer. Statist. Assoc. 91 791--806.
  • Braithwaite, R. B. (1953). Scientific Explanation. Cambridge Univ. Press.
  • Brown, L. D. (1978). A contribution to Kiefer's theory of conditional confidence procedures. Ann. Statist. 6 59--71.
  • Carlson, R. (1976). The logic of tests of significance (with discussion). Philos. Sci. 43 116--128.
  • Casella, G. and Berger, R. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with discussion). J. Amer. Statist. Assoc. 82 106--111, 123-- 139.
  • Cox, D. R. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357--372.
  • Dass, S. (2001). Unified Bayesian and conditional frequentist testing for discrete distributions. Sankhyā Ser. B 63 251-- 269.
  • Dass, S. and Berger, J. (2003). Unified conditional frequentist and Bayesian testing of composite hypotheses. Scand. J. Statist. 30 193--210.
  • Delampady, M. and Berger, J. (1990). Lower bounds on Bayes factors for multinomial distributions, with application to chi-squared tests of fit. Ann. Statist. 18 1295--1316.
  • Edwards, W., Lindman, H. and Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review 70 193--242.
  • Efron, B. and Gous, A. (2001). Scales of evidence for model selection: Fisher versus Jeffreys (with discussion). In Model Selection (P. Lahiri, ed.) 208--256. IMS, Hayward, CA.
  • Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh (10th ed., 1946).
  • Fisher, R. A. (1935). The logic of inductive inference (with discussion). J. Roy. Statist. Soc. 98 39--82.
  • Fisher, R. A. (1955). Statistical methods and scientific induction. J. Roy. Statist. Soc. Ser. B 17 69--78.
  • Fisher, R. A. (1973). Statistical Methods and Scientific Inference, 3rd ed. Macmillan, London.
  • Gibbons, J. and Pratt, J. (1975). $P$-values: Interpretation and methodology. Amer. Statist. 29 20--25.
  • Good, I. J. (1958). Significance tests in parallel and in series. J. Amer. Statist. Assoc. 53 799--813.
  • Good, I. J. (1992). The Bayes/non-Bayes compromise: A brief review. J. Amer. Statist. Assoc. 87 597--606.
  • Goodman, S. (1992). A comment on replication, $p$-values and evidence. Statistics in Medicine 11 875--879.
  • Goodman, S. (1993). $P$-values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. American Journal of Epidemiology 137 485--496.
  • Goodman, S. (1999a). Toward evidence-based medical statistics. 1: The $p$-value fallacy. Annals of Internal Medicine 130 995--1004.
  • Goodman, S. (1999b). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine 130 1005--1013.
  • Hacking, I. (1965). Logic of Statistical Inference. Cambridge Univ. Press.
  • Hall, P. and Selinger, B. (1986). Statistical significance: Balancing evidence against doubt. Austral. J. Statist. 28 354--370.
  • Hubbard, R. (2000). Minding one's $p$'s and $\alpha$'s: Confusion in the reporting and interpretation of results of classical statistical tests in marketing research. Technical Report, College of Business and Public Administration, Drake Univ.
  • Jeffreys, H. (1961). Theory of Probability, 3rd ed. Oxford Univ. Press.
  • Johnstone, D. J. (1997). Comparative classical and Bayesian interpretations of statistical compliance tests in auditing. Accounting and Business Research 28 53--82.
  • Kalbfleish, J. D. and Sprott, D. A. (1973). Marginal and conditional likelihoods. Sankhyā Ser. A 35 311--328.
  • Kiefer, J. (1976). Admissibility of conditional confidence procedures. Ann. Math. Statist. 4 836--865.
  • Kiefer, J. (1977). Conditional confidence statements and confidence estimators (with discussion). J. Amer. Statist. Assoc. 72 789--827.
  • Kyburg, H. E., Jr. (1974). The Logical Foundations of Statistical Inference. Reidel, Boston.
  • Laplace, P. S. (1812). Théorie Analytique des Probabilités. Courcier, Paris.
  • Lehmann, E. (1993). The Fisher, Neyman--Pearson theories of testing hypotheses: One theory or two? J. Amer. Statist. Assoc. 88 1242--1249.
  • Matthews, R. (1998). The great health hoax. The Sunday Telegraph, September 13.
  • Morrison, D. E. and Henkel, R. E., eds. (1970). The Significance Test Controversy. A Reader. Aldine, Chicago.
  • Neyman, J. (1961). Silver jubilee of my dispute with Fisher. J. Operations Res. Soc. Japan 3 145--154.
  • Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese 36 97--131.
  • Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. London Ser. A 231 289--337.
  • Paulo, R. (2002a). Unified Bayesian and conditional frequentist testing in the one- and two-sample exponential distribution problem. Technical Report, Duke Univ.
  • Paulo, R. (2002b). Simultaneous Bayesian--frequentist tests for the drift of Brownian motion. Technical Report, Duke Univ.
  • Pearson, E. S. (1955). Statistical concepts in their relation to reality. J. Roy. Statist. Soc. Ser. B 17 204--207.
  • Pearson, E. S. (1962). Some thoughts on statistical inference. Ann. Math. Statist. 33 394--403.
  • Reid, N. (1995). The roles of conditioning in inference (with discussion). Statist. Sci. 10 138--157, 173--199.
  • Robins, J. M., van der Vaart, A. and Ventura, V. (2000). Asymptotic distribution of $p$ values in composite null models (with discussion). J. Amer. Statist. Assoc. 95 1143--1167, 1171--1172.
  • Royall, R. M. (1997). Statistical Evidence: A Likelihood Paradigm. Chapman and Hall, New York.
  • Savage, L. J. (1976). On rereading R. A. Fisher (with discussion). Ann. Statist. 4 441--500.
  • Seidenfeld, T. (1979). Philosophical Problems of Statistical Inference. Reidel, Boston.
  • Sellke, T., Bayarri, M. J. and Berger, J. O. (2001). Calibration of $p$-values for testing precise null hypotheses. Amer. Statist. 55 62--71.
  • Spielman, S. (1974). The logic of tests of significance. Philos. Sci. 41 211--226.
  • Spielman, S. (1978). Statistical dogma and the logic of significance testing. Philos. Sci. 45 120--135.
  • Sterne, J. A. C. and Davey Smith, G. (2001). Sifting the evidence---what's wrong with significance tests? British Medical Journal 322 226--231.
  • Welch, B. and Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. Ser. B 25 318--329.
  • Wolpert, R. L. (1996). Testing simple hypotheses. In Data Analysis and Information Systems (H. H. Bock and W. Polasek, eds.) 7 289--297. Springer, Heidelberg.
  • Zabell, S. (1992). R. A. Fisher and the fiducial argument. Statist. Sci. 7 369--387.

See also

  • Includes: Ronald Christensen. Comment.
  • Includes: Wesley O. Johnson. Comment.
  • Includes: Michael Lavine. Comment.
  • Includes: Subhash R. Lele. Comment.
  • Includes: Deborah G. Mayo. Comment.
  • Includes: Luis R. Pericchi. Comment.
  • Includes: N. Reid. Comment.
  • Includes: James O. Berger. Rejoinder.