Ronald Fisher advocated testing using p-values, Harold Jeffreys proposed use of objective posterior probabilities of hypotheses and Jerzy Neyman recommended testing with fixed error probabilities. Each was quite critical of the other approaches. Most troubling for statistics and science is that the three approaches can lead to quite different practical conclusions.
This article focuses on discussion of the conditional frequentist approach to testing, which is argued to provide the basis for a methodological unification of the approaches of Fisher, Jeffreys and Neyman. The idea is to follow Fisher in using p-values to define the "strength of evidence" in data and to follow his approach of conditioning on strength of evidence; then follow Neyman by computing Type I and Type II error probabilities, but do so conditional on the strength of evidence in the data. The resulting conditional frequentist error probabilities equal the objective posterior probabilities of the hypotheses advocated by Jeffreys.
Includes: Ronald Christensen. Comment.
Includes: Wesley O. Johnson. Comment.
Includes: Michael Lavine. Comment.
Includes: Subhash R. Lele. Comment.
Includes: Deborah G. Mayo. Comment.
Includes: Luis R. Pericchi. Comment.
Includes: N. Reid. Comment.
Includes: James O. Berger. Rejoinder.
References
Barnett, V. (1999). Comparative Statistical Inference, 3rd ed. Wiley, New York.
Mathematical Reviews (MathSciNet):
MR663189
Basu, D. (1975). Statistical information and likelihood (with discussion). Sankhyā Ser. A 37 1--71.
Mathematical Reviews (MathSciNet):
MR440747
Basu, D. (1977). On the elimination of nuisance parameters. J. Amer. Statist. Assoc. 72 355--366.
Mathematical Reviews (MathSciNet):
MR451477
Bayarri, M. J. and Berger, J. (2000). $P$-values for composite null models (with discussion). J. Amer. Statist. Assoc. 95 1127--1142, 1157--1170.
Berger, J. (1985a). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet):
MR804611
Berger, J. (1985b). The frequentist viewpoint and conditioning. In Proc. Berkeley Conference in Honor of Jack Kiefer and Jerzy Neyman (L. Le Cam and R. Olshen, eds.) 1 15--44. Wadsworth, Belmont, CA.
Mathematical Reviews (MathSciNet):
MR822033
Berger, J. and Berry, D. (1988). Statistical analysis and the illusion of objectivity. American Scientist 76 159--165.
Berger, J., Boukai, B. and Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion). Statist. Sci. 12 133--160.
Berger, J., Boukai, B. and Wang, Y. (1999). Simultaneous Bayesian--frequentist sequential testing of nested hypotheses. Biometrika 86 79--92.
Berger, J., Brown, L. and Wolpert, R. (1994). A unified conditional frequentist and Bayesian test for fixed and sequential simple hypothesis testing. Ann. Statist. 22 1787--1807.
Berger, J. and Delampady, M. (1987). Testing precise hypotheses (with discussion). Statist. Sci. 2 317--352.
Mathematical Reviews (MathSciNet):
MR920141
Berger, J. and Guglielmi, A. (2001). Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. J. Amer. Statist. Assoc. 96 174--184.
Berger, J. and Mortera, J. (1999). Default Bayes factors for non-nested hypothesis testing. J. Amer. Statist. Assoc. 94 542--554.
Berger, J. and Sellke, T. (1987). Testing a point null hypothesis: The irreconcilability of $p$ values and evidence (with discussion). J. Amer. Statist. Assoc. 82 112--139.
Mathematical Reviews (MathSciNet):
MR883340
Berger, J. and Wolpert, R. L. (1988). The Likelihood Principle, 2nd ed. (with discussion). IMS, Hayward, CA.
Birnbaum, A. (1961). On the foundations of statistical inference: Binary experiments. Ann. Math. Statist. 32 414--435.
Mathematical Reviews (MathSciNet):
MR126307
Bjørnstad, J. (1996). On the generalization of the likelihood function and the likelihood principle. J. Amer. Statist. Assoc. 91 791--806.
Braithwaite, R. B. (1953). Scientific Explanation. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet):
MR55589
Brown, L. D. (1978). A contribution to Kiefer's theory of conditional confidence procedures. Ann. Statist. 6 59--71.
Mathematical Reviews (MathSciNet):
MR471160
Carlson, R. (1976). The logic of tests of significance (with discussion). Philos. Sci. 43 116--128.
Mathematical Reviews (MathSciNet):
MR464472
Casella, G. and Berger, R. (1987). Reconciling Bayesian and frequentist evidence in the one-sided testing problem (with discussion). J. Amer. Statist. Assoc. 82 106--111, 123-- 139.
Mathematical Reviews (MathSciNet):
MR883339
Cox, D. R. (1958). Some problems connected with statistical inference. Ann. Math. Statist. 29 357--372.
Mathematical Reviews (MathSciNet):
MR94890
Dass, S. (2001). Unified Bayesian and conditional frequentist testing for discrete distributions. Sankhyā Ser. B 63 251-- 269.
Dass, S. and Berger, J. (2003). Unified conditional frequentist and Bayesian testing of composite hypotheses. Scand. J. Statist. 30 193--210.
Delampady, M. and Berger, J. (1990). Lower bounds on Bayes factors for multinomial distributions, with application to chi-squared tests of fit. Ann. Statist. 18 1295--1316.
Edwards, W., Lindman, H. and Savage, L. J. (1963). Bayesian statistical inference for psychological research. Psychological Review 70 193--242.
Efron, B. and Gous, A. (2001). Scales of evidence for model selection: Fisher versus Jeffreys (with discussion). In Model Selection (P. Lahiri, ed.) 208--256. IMS, Hayward, CA.
Fisher, R. A. (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh (10th ed., 1946).
Fisher, R. A. (1935). The logic of inductive inference (with discussion). J. Roy. Statist. Soc. 98 39--82.
Fisher, R. A. (1955). Statistical methods and scientific induction. J. Roy. Statist. Soc. Ser. B 17 69--78.
Mathematical Reviews (MathSciNet):
MR76233
Fisher, R. A. (1973). Statistical Methods and Scientific Inference, 3rd ed. Macmillan, London.
Mathematical Reviews (MathSciNet):
MR346955
Gibbons, J. and Pratt, J. (1975). $P$-values: Interpretation and methodology. Amer. Statist. 29 20--25.
Good, I. J. (1958). Significance tests in parallel and in series. J. Amer. Statist. Assoc. 53 799--813.
Mathematical Reviews (MathSciNet):
MR103560
Good, I. J. (1992). The Bayes/non-Bayes compromise: A brief review. J. Amer. Statist. Assoc. 87 597--606.
Goodman, S. (1992). A comment on replication, $p$-values and evidence. Statistics in Medicine 11 875--879.
Goodman, S. (1993). $P$-values, hypothesis tests, and likelihood: Implications for epidemiology of a neglected historical debate. American Journal of Epidemiology 137 485--496.
Goodman, S. (1999a). Toward evidence-based medical statistics. 1: The $p$-value fallacy. Annals of Internal Medicine 130 995--1004.
Goodman, S. (1999b). Toward evidence-based medical statistics. 2: The Bayes factor. Annals of Internal Medicine 130 1005--1013.
Hacking, I. (1965). Logic of Statistical Inference. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet):
MR203837
Hall, P. and Selinger, B. (1986). Statistical significance: Balancing evidence against doubt. Austral. J. Statist. 28 354--370.
Hubbard, R. (2000). Minding one's $p$'s and $\alpha$'s: Confusion in the reporting and interpretation of results of classical statistical tests in marketing research. Technical Report, College of Business and Public Administration, Drake Univ.
Jeffreys, H. (1961). Theory of Probability, 3rd ed. Oxford Univ. Press.
Mathematical Reviews (MathSciNet):
MR187257
Johnstone, D. J. (1997). Comparative classical and Bayesian interpretations of statistical compliance tests in auditing. Accounting and Business Research 28 53--82.
Kalbfleish, J. D. and Sprott, D. A. (1973). Marginal and conditional likelihoods. Sankhyā Ser. A 35 311--328.
Mathematical Reviews (MathSciNet):
MR518599
Kiefer, J. (1976). Admissibility of conditional confidence procedures. Ann. Math. Statist. 4 836--865.
Mathematical Reviews (MathSciNet):
MR438543
Kiefer, J. (1977). Conditional confidence statements and confidence estimators (with discussion). J. Amer. Statist. Assoc. 72 789--827.
Mathematical Reviews (MathSciNet):
MR518611
Kyburg, H. E., Jr. (1974). The Logical Foundations of Statistical Inference. Reidel, Boston.
Mathematical Reviews (MathSciNet):
MR533283
Laplace, P. S. (1812). Théorie Analytique des Probabilités. Courcier, Paris.
Lehmann, E. (1993). The Fisher, Neyman--Pearson theories of testing hypotheses: One theory or two? J. Amer. Statist. Assoc. 88 1242--1249.
Matthews, R. (1998). The great health hoax. The Sunday Telegraph, September 13.
Morrison, D. E. and Henkel, R. E., eds. (1970). The Significance Test Controversy. A Reader. Aldine, Chicago.
Neyman, J. (1961). Silver jubilee of my dispute with Fisher. J. Operations Res. Soc. Japan 3 145--154.
Mathematical Reviews (MathSciNet):
MR131300
Neyman, J. (1977). Frequentist probability and frequentist statistics. Synthese 36 97--131.
Mathematical Reviews (MathSciNet):
MR652325
Neyman, J. and Pearson, E. S. (1933). On the problem of the most efficient tests of statistical hypotheses. Philos. Trans. Roy. Soc. London Ser. A 231 289--337.
Paulo, R. (2002a). Unified Bayesian and conditional frequentist testing in the one- and two-sample exponential distribution problem. Technical Report, Duke Univ.
Paulo, R. (2002b). Simultaneous Bayesian--frequentist tests for the drift of Brownian motion. Technical Report, Duke Univ.
Pearson, E. S. (1955). Statistical concepts in their relation to reality. J. Roy. Statist. Soc. Ser. B 17 204--207.
Mathematical Reviews (MathSciNet):
MR76234
Pearson, E. S. (1962). Some thoughts on statistical inference. Ann. Math. Statist. 33 394--403.
Mathematical Reviews (MathSciNet):
MR137249
Reid, N. (1995). The roles of conditioning in inference (with discussion). Statist. Sci. 10 138--157, 173--199.
Robins, J. M., van der Vaart, A. and Ventura, V. (2000). Asymptotic distribution of $p$ values in composite null models (with discussion). J. Amer. Statist. Assoc. 95 1143--1167, 1171--1172.
Royall, R. M. (1997). Statistical Evidence: A Likelihood Paradigm. Chapman and Hall, New York.
Savage, L. J. (1976). On rereading R. A. Fisher (with discussion). Ann. Statist. 4 441--500.
Mathematical Reviews (MathSciNet):
MR403889
Seidenfeld, T. (1979). Philosophical Problems of Statistical Inference. Reidel, Boston.
Mathematical Reviews (MathSciNet):
MR606013
Sellke, T., Bayarri, M. J. and Berger, J. O. (2001). Calibration of $p$-values for testing precise null hypotheses. Amer. Statist. 55 62--71.
Spielman, S. (1974). The logic of tests of significance. Philos. Sci. 41 211--226.
Mathematical Reviews (MathSciNet):
MR455167
Spielman, S. (1978). Statistical dogma and the logic of significance testing. Philos. Sci. 45 120--135.
Mathematical Reviews (MathSciNet):
MR492876
Sterne, J. A. C. and Davey Smith, G. (2001). Sifting the evidence---what's wrong with significance tests? British Medical Journal 322 226--231.
Welch, B. and Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. Ser. B 25 318--329.
Mathematical Reviews (MathSciNet):
MR173309
Wolpert, R. L. (1996). Testing simple hypotheses. In Data Analysis and Information Systems (H. H. Bock and W. Polasek, eds.) 7 289--297. Springer, Heidelberg.
Zabell, S. (1992). R. A. Fisher and the fiducial argument. Statist. Sci. 7 369--387.