Statistical Science

Could Fisher, Jeffreys and Neyman Have Agreed on Testing?

James O. Berger

Source: Statist. Sci. Volume 18, Issue 1 (2003), 1-32.

Abstract

Ronald Fisher advocated testing using p-values, Harold Jeffreys proposed use of objective posterior probabilities of hypotheses and Jerzy Neyman recommended testing with fixed error probabilities. Each was quite critical of the other approaches. Most troubling for statistics and science is that the three approaches can lead to quite different practical conclusions.

This article focuses on discussion of the conditional frequentist approach to testing, which is argued to provide the basis for a methodological unification of the approaches of Fisher, Jeffreys and Neyman. The idea is to follow Fisher in using p-values to define the "strength of evidence" in data and to follow his approach of conditioning on strength of evidence; then follow Neyman by computing Type I and Type II error probabilities, but do so conditional on the strength of evidence in the data. The resulting conditional frequentist error probabilities equal the objective posterior probabilities of the hypotheses advocated by Jeffreys.

Related Works:

Keywords: p-values; posterior probabilities of hypotheses; Type I and Type II error probabilities; conditional testing.

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1056397485
Digital Object Identifier: doi:10.1214/ss/1056397485
Mathematical Reviews number (MathSciNet): MR1997064
Zentralblatt MATH identifier: 02068939

References

Barnett, V. (1999). Comparative Statistical Inference, 3rd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR663189
Zentralblatt MATH: 0593.62002
Basu, D. (1975). Statistical information and likelihood (with discussion). Sankhyā Ser. A 37 1--71.
Mathematical Reviews (MathSciNet): MR440747
Basu, D. (1977). On the elimination of nuisance parameters. J. Amer. Statist. Assoc. 72 355--366.
Mathematical Reviews (MathSciNet): MR451477
Bayarri, M. J. and Berger, J. (2000). $P$-values for composite null models (with discussion). J. Amer. Statist. Assoc. 95 1127--1142, 1157--1170.
Mathematical Reviews (MathSciNet): MR1804239
Berger, J. (1985a). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR804611
Zentralblatt MATH: 0572.62008
Berger, J. (1985b). The frequentist viewpoint and conditioning. In Proc. Berkeley Conference in Honor of Jack Kiefer and Jerzy Neyman (L. Le Cam and R. Olshen, eds.) 1 15--44. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet): MR822033
Berger, J. and Berry, D. (1988). Statistical analysis and the illusion of objectivity. American Scientist 76 159--165.
Berger, J., Boukai, B. and Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion). Statist. Sci. 12 133--160.
Mathematical Reviews (MathSciNet): MR1617518
Digital Object Identifier: doi:10.1214/ss/1030037904
Project Euclid: euclid.ss/1030037904
Berger, J., Boukai, B. and Wang, Y. (1999). Simultaneous Bayesian--frequentist sequential testing of nested hypotheses. Biometrika 86 79--92.
Mathematical Reviews (MathSciNet): MR1688073
Zentralblatt MATH: 0920.62103
Digital Object Identifier: doi:10.1093/biomet/86.1.79
Berger, J., Brown, L. and Wolpert, R. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential simple hypothesis testing. Ann. Statist. 22 1787--1807.
Mathematical Reviews (MathSciNet): MR1329168
Berger, J. and Delampady, M. (1987). Testing precise hypotheses (with discussion). Statist. Sci. 2 317--352.
Mathematical Reviews (MathSciNet): MR920141
Berger, J. and Guglielmi, A. (2001). Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. J. Amer. Statist. Assoc. 96 174--184.
Mathematical Reviews (MathSciNet): MR1952730
Digital Object Identifier: doi:10.1198/016214501750333045
Berger, J. and Mortera, J. (1999). Default Bayes factors for non-nested hypothesis testing. J. Amer. Statist. Assoc. 94 542--554.
Mathematical Reviews (MathSciNet): MR1702325
Berger, J. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of $p$ values and evidence (with discussion). J. Amer. Statist. Assoc. 82 112--139.
Mathematical Reviews (MathSciNet): MR883340
Berger, J. and Wolpert, R. L. (1988). The Likelihood Principle, 2nd ed. (with discussion). IMS, Hayward, CA.
Birnbaum, A. (1961). On the foundations of statistical inference: Binary experiments. Ann. Math. Statist. 32 414--435.
Mathematical Reviews (MathSciNet): MR126307
Bjørnstad, J. (1996). On the generalization of the likelihood function and the likelihood principle. J. Amer. Statist. Assoc. 91 791--806.
Mathematical Reviews (MathSciNet): MR1395746
Braithwaite, R. B. (1953). Scientific Explanation. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR55589
Zentralblatt MATH: 0052.00401
Brown, L. D. (1978). A contribution to Kiefer's theory of conditional confidence procedures. Ann. Statist. 6 59--71.
Mathematical Reviews (MathSciNet): MR471160
Carlson, R. (1976). The logic of tests of significance (with discussion). Philos. Sci. 43 116--128.
Mathematical Reviews (MathSciNet): MR464472
Digital Object Identifier: doi:10.1086/288672
Casella, G. and Berger, R. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with discussion). J. Amer. Statist. Assoc. 82 106--111, 123-- 139.
Mathematical Reviews (MathSciNet): MR883339
Cox, D. R. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357--372.
Mathematical Reviews (MathSciNet): MR94890
Dass, S. (2001). Unified Bayesian and conditional frequentist testing for discrete distributions. Sankhyā Ser. B 63 251-- 269.
Mathematical Reviews (MathSciNet): MR1970223
Dass, S. and Berger, J. (2003). Unified conditional frequentist and Bayesian testing of composite hypotheses. Scand. J. Statist. 30 193--210.
Mathematical Reviews (MathSciNet): MR1965102
Digital Object Identifier: doi:10.1111/1467-9469.00326
Delampady, M. and Berger, J. (1990). Lower bounds on Bayes factors for multinomial distributions, with application to chi-squared tests of fit. Ann. Statist. 18 1295--1316.
Mathematical Reviews (MathSciNet): MR1062709
Edwards, W., Lindman, H. and Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review 70 193--242.
Efron, B. and Gous, A. (2001). Scales of evidence for model selection: Fisher versus Jeffreys (with discussion). In Model Selection (P. Lahiri, ed.) 208--256. IMS, Hayward, CA.
Mathematical Reviews (MathSciNet): MR2000754
Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh (10th ed., 1946).
Fisher, R. A. (1935). The logic of inductive inference (with discussion). J. Roy. Statist. Soc. 98 39--82.
Fisher, R. A. (1955). Statistical methods and scientific induction. J. Roy. Statist. Soc. Ser. B 17 69--78.
Mathematical Reviews (MathSciNet): MR76233
Fisher, R. A. (1973). Statistical Methods and Scientific Inference, 3rd ed. Macmillan, London.
Mathematical Reviews (MathSciNet): MR346955
Zentralblatt MATH: 0281.62002
Gibbons, J. and Pratt, J. (1975). $P$-values: Interpretation and methodology. Amer. Statist. 29 20--25.
Good, I. J. (1958). Significance tests in parallel and in series. J. Amer. Statist. Assoc. 53 799--813.
Mathematical Reviews (MathSciNet): MR103560
Good, I. J. (1992). The Bayes/non-Bayes compromise: A brief review. J. Amer. Statist. Assoc. 87 597--606.
Mathematical Reviews (MathSciNet): MR1185188
Goodman, S. (1992). A comment on replication, $p$-values and evidence. Statistics in Medicine 11 875--879.
Goodman, S. (1993). $P$-values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. American Journal of Epidemiology 137 485--496.
Goodman, S. (1999a). Toward evidence-based medical statistics. 1: The $p$-value fallacy. Annals of Internal Medicine 130 995--1004.
Goodman, S. (1999b). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine 130 1005--1013.
Hacking, I. (1965). Logic of Statistical Inference. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR203837
Zentralblatt MATH: 0133.41604
Hall, P. and Selinger, B. (1986). Statistical significance: Balancing evidence against doubt. Austral. J. Statist. 28 354--370.
Hubbard, R. (2000). Minding one's $p$'s and $\alpha$'s: Confusion in the reporting and interpretation of results of classical statistical tests in marketing research. Technical Report, College of Business and Public Administration, Drake Univ.
Jeffreys, H. (1961). Theory of Probability, 3rd ed. Oxford Univ. Press.
Mathematical Reviews (MathSciNet): MR187257
Zentralblatt MATH: 0116.34904
Johnstone, D. J. (1997). Comparative classical and Bayesian interpretations of statistical compliance tests in auditing. Accounting and Business Research 28 53--82.
Kalbfleish, J. D. and Sprott, D. A. (1973). Marginal and conditional likelihoods. Sankhyā Ser. A 35 311--328.
Mathematical Reviews (MathSciNet): MR518599
Kiefer, J. (1976). Admissibility of conditional confidence procedures. Ann. Math. Statist. 4 836--865.
Mathematical Reviews (MathSciNet): MR438543
Kiefer, J. (1977). Conditional confidence statements and confidence estimators (with discussion). J. Amer. Statist. Assoc. 72 789--827.
Mathematical Reviews (MathSciNet): MR518611
Kyburg, H. E., Jr. (1974). The Logical Foundations of Statistical Inference. Reidel, Boston.
Mathematical Reviews (MathSciNet): MR533283
Zentralblatt MATH: 0335.02001
Laplace, P. S. (1812). Théorie Analytique des Probabilités. Courcier, Paris.
Lehmann, E. (1993). The Fisher, Neyman--Pearson theories of testing hypotheses: One theory or two? J. Amer. Statist. Assoc. 88 1242--1249.
Mathematical Reviews (MathSciNet): MR1245356
Matthews, R. (1998). The great health hoax. The Sunday Telegraph, September 13.
Morrison, D. E. and Henkel, R. E., eds. (1970). The Significance Test Controversy. A Reader. Aldine, Chicago.
Neyman, J. (1961). Silver jubilee of my dispute with Fisher. J. Operations Res. Soc. Japan 3 145--154.
Mathematical Reviews (MathSciNet): MR131300
Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese 36 97--131.
Mathematical Reviews (MathSciNet): MR652325
Digital Object Identifier: doi:10.1007/BF00485695
Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. London Ser. A 231 289--337.
Paulo, R. (2002a). Unified Bayesian and conditional frequentist testing in the one- and two-sample exponential distribution problem. Technical Report, Duke Univ.
Paulo, R. (2002b). Simultaneous Bayesian--frequentist tests for the drift of Brownian motion. Technical Report, Duke Univ.
Pearson, E. S. (1955). Statistical concepts in their relation to reality. J. Roy. Statist. Soc. Ser. B 17 204--207.
Mathematical Reviews (MathSciNet): MR76234
Pearson, E. S. (1962). Some thoughts on statistical inference. Ann. Math. Statist. 33 394--403.
Mathematical Reviews (MathSciNet): MR137249
Reid, N. (1995). The roles of conditioning in inference (with discussion). Statist. Sci. 10 138--157, 173--199.
Mathematical Reviews (MathSciNet): MR1368097
Robins, J. M., van der Vaart, A. and Ventura, V. (2000). Asymptotic distribution of $p$ values in composite null models (with discussion). J. Amer. Statist. Assoc. 95 1143--1167, 1171--1172.
Mathematical Reviews (MathSciNet): MR1804240
Royall, R. M. (1997). Statistical Evidence: A Likelihood Paradigm. Chapman and Hall, New York.
Mathematical Reviews (MathSciNet): MR1629481
Zentralblatt MATH: 0919.62004
Savage, L. J. (1976). On rereading R. A. Fisher (with discussion). Ann. Statist. 4 441--500.
Mathematical Reviews (MathSciNet): MR403889
Seidenfeld, T. (1979). Philosophical Problems of Statistical Inference. Reidel, Boston.
Mathematical Reviews (MathSciNet): MR606013
Zentralblatt MATH: 0487.62004
Sellke, T., Bayarri, M. J. and Berger, J. O. (2001). Calibration of $p$-values for testing precise null hypotheses. Amer. Statist. 55 62--71.
Mathematical Reviews (MathSciNet): MR1818723
Digital Object Identifier: doi:10.1198/000313001300339950
Spielman, S. (1974). The logic of tests of significance. Philos. Sci. 41 211--226.
Mathematical Reviews (MathSciNet): MR455167
Digital Object Identifier: doi:10.1086/288590
Spielman, S. (1978). Statistical dogma and the logic of significance testing. Philos. Sci. 45 120--135.
Mathematical Reviews (MathSciNet): MR492876
Digital Object Identifier: doi:10.1086/288784
Sterne, J. A. C. and Davey Smith, G. (2001). Sifting the evidence---what's wrong with significance tests? British Medical Journal 322 226--231.
Welch, B. and Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. Ser. B 25 318--329.
Mathematical Reviews (MathSciNet): MR173309
Wolpert, R. L. (1996). Testing simple hypotheses. In Data Analysis and Information Systems (H. H. Bock and W. Polasek, eds.) 7 289--297. Springer, Heidelberg.
Zabell, S. (1992). R. A. Fisher and the fiducial argument. Statist. Sci. 7 369--387.
Mathematical Reviews (MathSciNet): MR1181418

2009 © Institute of Mathematical Statistics