International Statistical Review

Critique of p-Values

Bill Thompson

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


This paper generalizes the notion of p-value to obtain a system for assessing evidence in favor of an hypothesis. It is not quite a quantification in that evidence is a pair of numbers (the p-value and the p-value with null and alternative interchanged) with evidence for the alternative being claimed when the first number is small and the second is at least moderately large. Traditional significance tests present p-values as a measure of evidence {against} a theory. This usage is rarely called for since scientists usually wish to accept theories (for the time being) not just not reject them; they are more interested in evidence {for} a theory. P-values are not just good or bad for this purpose; their efficacy depends on specifics. We find that a single p-value does not measure evidence for a simple hypothesis relative to a simple alternative, but consideration of both p-values leads to a satisfactory theory. This consideration does not, in general, extend to composite hypotheses since there, best evidence calls for optimization of a bivariate objective function. But in some cases, notably one sided tests for the exponential family, the optimization can be solved, and a single p-value does provide an appealing measure of best evidence for a hypothesis. One possible extension of this theory is proposed and illustrated with a practical safety analysis problem involving the difference of two random variables.

Article information

Internat. Statist. Rev., Volume 74, Number 1 (2006), 1-14.

First available in Project Euclid: 29 March 2006

Permanent link to this document

Zentralblatt MATH identifier

p-value: Exponential family Resolving disagreement Evidence for a theory Safety analysis


Thompson, Bill. Critique of p-Values. Internat. Statist. Rev. 74 (2006), no. 1, 1--14.

Export citation


  • [1] Barnard, G.A. (1980). Discussion of Box, G.P. Sampling and Bayes' Inference in Scientific Modeling and Robustness. Journal of the Royal Statistical Society, Ser. A., 143, 404-406.
  • [2] Berger, J.O. & Berry D.A. (1988). Statistical analysis and the illusion of objectivity. American Scientist, 76, 159-165.
  • [3] Berger, J.O. & Guglielmi, A. (2001). Bayesian and conditional frequentist testing of a parametric model versus nonparametric alternatives. Journal of the American Statistical Association, 96, 174-184.
  • [4] Berger, J.O. & Selke, T. (1987). Testing a Point Null Hypothesis: The Irreconcilabity of p-value and Evidence. Journal of the American Statistical Association, 82, 112-139.
  • [5] Berger, J.O. & Wolpert R.L. (1984). The Likelihood Principle. Hayward, California: Institute of Mathematical Statistics.
  • [6] Berkson, J. (1942). Tests of significance considered as evidence. Journal of the American Statistical Association, 37, 325-335.
  • [7] Casella, G. & Berger, R. (1987). Reconciling Bayesian and Frequentist Evidence in the One-sided Testing Problem. Journal of the American Statistical Association, 82, 106-111.
  • [8] Cox, D.R. & Hinkley, D.V. (1974). Theoretical Statistics. London: Chapman and Hall.
  • [9] Datta, G.S. & Mukerjee, R. (2003). Probability matching priors for predicting a dependent variable with application to regression models. Annals of the Institute of Statistical Mathematics, 55, 1-6.
  • [10] Dempster, A.P. & Schatzoff, M. (1965). Expected Significance Level as a Sensitivity Index for Test Statistics. Journal of the American Statistical Association, 60, 420-436.
  • [11] Efron, B. (2005). Bayesians, Frequentists, and Scientists. Journal of the American Statistical Association, 100, 1-5.
  • [12] Fisher, R.A. (1949). The Design of Experiments, Fifth edition. New York: Hafner.
  • [13] Fisher, R.A. (1956). Statistical Methods and Scientific Inference. New York: Hafner.
  • [14] Fraser, D.A.S. & Reid, N. (2002). Strong matching of frequentist and Bayesian parametric inference. Journal of Statistical Planning and Inference, 103, 263-285.
  • [15] Friedman, D., Pisani, R. & Purves, R. (1978). Statistics. New York: Norton.
  • [16] Hempel, C.G. (1952). Fundamentals of Concept Formation in Empirical Science. Chicago: University of Chicago Press.
  • [17] Johnston, D.J. (1986). Tests of Significance in Theory and in Practice. The Statistician, 35, 491-504.
  • [18] Kempthorne, O. & Folks, L. (1971). Probability, Statistics, and Data Analysis. Ames: Iowa Press.
  • [19] Lecoutre, B., Lecoutre, M.-P. & Poitevineau, J. (2001). Uses, abuses and misuses of significance tests in the scientific community: Won't the Bayesian choice be unavoidable? International Statistical Review, 69, 399-417.
  • [20] Lehmann, E.L. (1986). Testing Statistical Hypotheses, 2nd ed. New York: Wiley.
  • [21] Lehmann, E.L. (1993). The Fisher, Neyman-Pearson Theories of Testing Hypotheses. One Theory or Two? Journal of the American Statistical Association, 88, 1242-1249.
  • [22] Lindley, D.V. & Scott, W.F (1984). New Cambridge Statistical Tables. Cambridge: Cambridge University Press.
  • [23] Morrison, D.E. & Henkel, R.E. (1970). The Significance Test Controversy. Chicago: Aldine Publishing Company.
  • [24] Neyman, J. (1982). First Course in Probability and Statistics. New York: Henry Holt and Company.
  • [25] Ni, S. & Sun, D. (2003). Noninformative priors and frequentist risks of Bayesian estimators of vector-autoregressive models. Journal of Economics, 115, 159-197.
  • [26] O'Hagen, A. (1994). Kendall's Advanced Theory of Statistics, (vol. 2B) Bayesian Inference. Great Britain: Edward Arnold.
  • [27] Peto, R., Pike, M.C., Armitage, P., Breslow, N.E., Cox, D.R., Howard, S.V., Mantel, N., McPherson, K., Peto, J. & Smith, P.G. (1976). Design and Analysis of Randomized Clinical Trials Requiring Prolonged Observation of Each Patient, I: Introduction and Design. British Journal of Cancer, 34, 585-612.
  • [28] Popper, K.R (1983). Realism and the Aim of Science. London: Hutchinson.
  • [29] Pratt, J.W. (1965). Bayesian Interpretation of Standard Inference Statements. Journal of the Royal Statistical Society, B, 27, 169-203.
  • [30] Royall, R. (1986). The Effect of Sample Size on the Meaning of Significance Level. The American Statistician, 40, 313-315.
  • [31] Royall, R. (1997). Statistical Evidence, A Likelihood Paradigm. London: Chapman & Hall.
  • [32] Savage, L.J. (1962). The Foundations of Statistical Inference. London: Methuem.
  • [33] Seidenfeld, T., Kadane, J.B. & Schervish, M.J. (1989). On the Shared Preferences of Two Bayesian Decision Makers. Journal of Philosophy, 86, 225-244.
  • [34] Stone, M. (1969). The Role of Significance Testing: Some Data with a Message. Biometrika, 52, 623-627.
  • [35] Thompson, W.A., Jr. (1969). Applied Probability. New York: Holt, Rinehart and Winston.
  • [36] Thompson, W.A., Jr. (1985). Optimal Significance Procedures for Simple Hypotheses. Biometrika, 72, 230-232.
  • [37] Todhunter, I. (1949). A History of the Mathematical Theory of Probability. New York: Chelsea.
  • [38] Wilder, R.L (1983). Introduction to the Foundations of Mathematics, (2nd ed.). Malabar, Florida: Krieger.
  • [39] Wilson, G. (2003). Tides of change: Is Bayesianism the new paradigm in statistics? Journal of Statistical Planning and Inference, 113, 371-374.