Statistical Science

Harold Jeffreys’s Theory of Probability Revisited

Christian P. Robert, Nicolas Chopin, and Judith Rousseau

Full-text: Open access


Published exactly seventy years ago, Jeffreys’s Theory of Probability (1939) has had a unique impact on the Bayesian community and is now considered to be one of the main classics in Bayesian Statistics as well as the initiator of the objective Bayes school. In particular, its advances on the derivation of noninformative priors as well as on the scaling of Bayes factors have had a lasting impact on the field. However, the book reflects the characteristics of the time, especially in terms of mathematical rigor. In this paper we point out the fundamental aspects of this reference work, especially the thorough coverage of testing problems and the construction of both estimation and testing noninformative priors based on functional divergences. Our major aim here is to help modern readers in navigating in this difficult text and in concentrating on passages that are still relevant today.

Article information

Statist. Sci., Volume 24, Number 2 (2009), 141-172.

First available in Project Euclid: 14 January 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian foundations noninformative prior σ-finite measure Jeffreys’s prior Kullback divergence tests Bayes factor p-values goodness of fit


Robert, Christian P.; Chopin, Nicolas; Rousseau, Judith. Harold Jeffreys’s Theory of Probability Revisited. Statist. Sci. 24 (2009), no. 2, 141--172. doi:10.1214/09-STS284.

Export citation


  • Aldrich, J. (2008). R. A. Fisher on Bayes and Bayes’ theorem. Bayesian Anal. 3 161–170.
  • Balasubramanian, V. (1997). Statistical inference, Occam’s razor, and statistical mechanics on the space of probability distributions. Neural Comput. 9 349–368.
  • Basu, D. (1988). Statistical Information and Likelihood: A Collection of Critical Essays by Dr. D. Basu. Springer, New York.
  • Bauwens, L. (1984). Bayesian Full Information of Simultaneous Equations Models Using Integration by Monte Carlo. Lecture Notes in Economics and Mathematical Systems 232. Springer, New York.
  • Bayarri, M. and Garcia-Donato, G. (2007). Extending conventional priors for testing general hypotheses in linear models. Biometrika 94 135–152.
  • Bayes, T. (1963). An essay towards solving a problem in the doctrine of chances. Phil. Trans. Roy. Soc. 53 370–418.
  • Beaumont, M., Zhang, W. and Balding, D. (2002). Approximate Bayesian computation in population genetics. Genetics 162 2025–2035.
  • Berger, J. (1985). Statistical Decision Theory and Bayesian Analysis, 2nd ed. Springer, New York.
  • Berger, J. and Bernardo, J. (1992). On the development of the reference prior method. In Bayesian Statistics 4 (J. Berger, J. Bernardo, A. Dawid and A. Smith, eds.) 35–49. Oxford Univ. Press, London.
  • Berger, J., Bernardo, J. and Sun, D. (2009). Natural induction: An objective Bayesian approach. Rev. R. Acad. Cien. Serie A Mat. 103 125–135.
  • Berger, J., Boukai, B. and Wang, Y. (1997). Unified frequentist and Bayesian testing of a precise hypothesis (with discussion). Statist. Sci. 12 133–160.
  • Berger, J. and Jefferys, W. (1992). Sharpening Ockham’s razor on a Bayesian strop. Amer. Statist. 80 64–72.
  • Berger, J. and Pericchi, L. (1996). The intrinsic Bayes factor for model selection and prediction. J. Amer. Statist. Assoc. 91 109–122.
  • Berger, J., Pericchi, L. and Varshavsky, J. (1998). Bayes factors and marginal distributions in invariant situations. Sankhyā Ser. A 60 307–321.
  • Berger, J., Philippe, A. and Robert, C. (1998). Estimation of quadratic functions: Reference priors for non-centrality parameters. Statist. Sinica 8 359–375.
  • Berger, J. and Robert, C. (1990). Subjective hierarchical Bayes estimation of a multivariate normal mean: On the frequentist interface. Ann. Statist. 18 617–651.
  • Berger, J. and Sellke, T. (1987). Testing a point-null hypothesis: The irreconcilability of significance levels and evidence (with discussion). J. Amer. Statist. Assoc. 82 112–122.
  • Berger, J. and Wolpert, R. (1988). The Likelihood Principle, 2nd ed. IMS Lecture Notes—Monograph Series 9. IMS, Hayward.
  • Bernardo, J. (1979). Reference posterior distributions for Bayesian inference (with discussion). J. Roy. Statist. Soc. Ser. B 41 113–147.
  • Bernardo, J. and Smith, A. (1994). Bayesian Theory. Wiley, New York.
  • Billingsley, P. (1986). Probability and Measure, 2nd ed. Wiley, New York.
  • Broemeling, L. and Broemeling, A. (2003). Studies in the history of probability and statistics xlviii the Bayesian contributions of Ernest Lhoste. Biometrika 90 728–731.
  • Casella, G. and Berger, R. (2001). Statistical Inference, 2nd ed. Wadsworth, Belmont, CA.
  • Dacunha-Castelle, D. and Gassiat, E. (1999). Testing the order of a model using locally conic parametrization: Population mixtures and stationary ARMA processes. Ann. Statist. 27 1178–1209.
  • Darmois, G. (1935). Sur les lois de probabilité à estimation exhaustive. Comptes Rendus Acad. Sciences Paris 200 1265–1266.
  • Dawid, A. (1984). Present position and potential developments: Some personal views. Statistical theory. The prequential approach (with discussion). J. Roy. Statist. Soc. Ser. A 147 278–292.
  • Dawid, A. (2004). Probability, causality and the empirical world: A Bayes–de Finetti–Popper–Borel synthesis. Statist. Sci. 19 44–57.
  • Dawid, A., Stone, N. and Zidek, J. (1973). Marginalization paradoxes in Bayesian and structural inference (with discussion). J. Roy. Statist. Soc. Ser. B 35 189–233.
  • de Finetti, B. (1974). Theory of Probability, vol. 1. Wiley, New York.
  • de Finetti, B. (1975). Theory of Probability, vol. 2. Wiley, New York.
  • DeGroot, M. (1970). Optimal Statistical Decisions. McGraw-Hill, New York.
  • DeGroot, M. (1973). Doing what comes naturally: Interpreting a tail area as a posterior probability or as a likelihood ratio. J. Amer. Statist. Assoc. 68 966–969.
  • Diaconis, P. and Ylvisaker, D. (1985). Quantifying prior opinion. In Bayesian Statistics 2 (J. Bernardo, M. DeGroot, D. Lindley and A. Smith, eds.) 163–175. North-Holland, Amsterdam.
  • Earman, J. (1992). Bayes or Bust. MIT Press, Cambridge, MA.
  • Feller, W. (1970). An Introduction to Probability Theory and Its Applications, vol. 1. Wiley, New York.
  • Feller, W. (1971). An Introduction to Probability Theory and Its Applications, vol. 2. Wiley, New York.
  • Fienberg, S. (2006). When did Bayesian inference become “Bayesian”? Bayesian Anal. 1 1–40.
  • Ghosh, M. and Meeden, G. (1984). A new Bayesian analysis of a random effects model. J. Roy. Statist. Soc. Ser. B 43 474–482.
  • Good, I. (1962). Theory of Probability by Harold Jeffreys. J. Roy. Statist. Soc. Ser. A 125 487–489.
  • Good, I. (1980). The contributions of Jeffreys to Bayesian statistics. In Bayesian Analysis in Econometrics and Statistics: Essays in Honor of Harold Jeffreys 21–34. North-Holland, Amsterdam.
  • Gouriéroux, C. and Monfort, A. (1996). Statistics and Econometric Models. Cambridge Univ. Press.
  • Gradshteyn, I. and Ryzhik, I. (1980). Tables of Integrals, Series and Products. Academic Press, New York.
  • Haldane, J. (1932). A note on inverse probability. Proc. Cambridge Philos. Soc. 28 55–61.
  • Huzurbazar, V. (1976). Sufficient Statistics. Marcel Dekker, New York.
  • Jaakkola, T. and Jordan, M. (2000). Bayesian parameter estimation via variational methods. Statist. Comput. 10 25–37.
  • Jeffreys, H. (1931). Scientific Inference, 1st ed. Cambridge Univ. Press.
  • Jeffreys, H. (1937). Scientific Inference, 2nd ed. Cambridge Univ. Press.
  • Jeffreys, H. (1939). Theory of Probability, 1st ed. The Clarendon Press, Oxford.
  • Jeffreys, H. (1948). Theory of Probability, 2nd ed. The Clarendon Press, Oxford.
  • Jeffreys, H. (1961). Theory of Probability, 3rd ed. Oxford Classic Texts in the Physical Sciences. Oxford Univ. Press, Oxford.
  • Kass, R. (1989). The geometry of asymptotic inference (with discussion). Statist. Sci. 4 188–234.
  • Kass, R. and Wasserman, L. (1996). Formal rules of selecting prior distributions: A review and annotated bibliography. J. Amer. Statist. Assoc. 91 343–1370.
  • Koopman, B. (1936). On distributions admitting a sufficient statistic. Trans. Amer. Math. Soc. 39 399–409.
  • Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, New York.
  • Lhoste, E. (1923). Le calcul des probabilités appliqué à l’artillerie. Revue D’Artillerie 91 405–423, 516–532, 58–82 and 152–179.
  • Lindley, D. (1953). Statistical inference (with discussion). J. Roy. Statist. Soc. Ser. B 15 30–76.
  • Lindley, D. (1957). A statistical paradox. Biometrika 44 187–192.
  • Lindley, D. (1962). Theory of Probability by Harold Jeffreys. J. Amer. Statist. Assoc. 57 922–924.
  • Lindley, D. (1971). Bayesian Statistics, A Review. SIAM, Philadelphia.
  • Lindley, D. (1980). Jeffreys’s contribution to modern statistical thought. In Bayesian Analysis in Econometrics and Statistics: Essays in Honor of Harold Jeffreys 35–39. North-Holland, Amsterdam.
  • Lindley, D. and Smith, A. (1972). Bayes estimates for the linear model. J. Roy. Statist. Soc. Ser. B 34 1–41.
  • MacKay, D. J. C. (2002). Information Theory, Inference & Learning Algorithms. Cambridge Univ. Press.
  • Marin, J.-M., Mengersen, K. and Robert, C. (2005). Bayesian modelling and inference on mixtures of distributions. In Handbook of Statistics (C. Rao and D. Dey, eds.) 25. Springer, New York.
  • Marin, J.-M. and Robert, C. (2007). Bayesian Core. Springer, New York.
  • Pitman, E. (1936). Sufficient statistics and intrinsic accuracy. Proc. Cambridge Philos. Soc. 32 567–579.
  • Popper, K. (1934). The Logic of Scientific Discovery. Hutchinson and Co., London. (English translation, 1959.)
  • Raiffa, H. (1968). Decision Analysis: Introductory Lectures on Choices Under Uncertainty. Addison-Wesley, Reading, MA.
  • Raiffa, H. and Schlaifer, R. (1961). Applied statistical decision theory. Technical report, Division of Research, Graduate School of Business Administration, Harvard Univ.
  • Rissanen, J. (1983). A universal prior for integers and estimation by minimum description length. Ann. Statist. 11 416–431.
  • Rissanen, J. (1990). Complexity of models. In Complexity, Entropy, and the Physics of Information (W. Zurek, ed.) 8. Addison-Wesley, Reading, MA.
  • Robert, C. (1996). Intrinsic loss functions. Theory and Decision 40 191–214.
  • Robert, C. (2001). The Bayesian Choice, 2nd ed. Springer, New York.
  • Robert, C. and Casella, G. (1994). Distance penalized losses for testing and confidence set evaluation. Test 3 163–182.
  • Robert, C. and Casella, G. (2004). Monte Carlo Statistical Methods, 2nd ed. Springer, New York.
  • Rubin, H. (1987). A weak system of axioms for rational behavior and the nonseparability of utility from prior. Statist. Decision 5 47–58.
  • Savage, L. (1954). The Foundations of Statistical Inference. Wiley, New York.
  • Stigler, S. (1999). Statistics on the Table: The History of Statistical Concepts and Methods. Harvard Univ. Press, Cambridge, MA.
  • Tanner, M. and Wong, W. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528–550.
  • Tibshirani, R. (1989). Noninformative priors for one parameter of many. Biometrika 76 604–608.
  • Wald, A. (1950). Statistical Decision Functions. Wiley, New York.
  • Welch, B. and Peers, H. (1963). On formulae for confidence points based on integrals of weighted likelihoods. J. Roy. Statist. Soc. Ser. B 25 318–329.
  • Zellner, A. (1980). Introduction. In Bayesian Analysis in Econometrics and Statistics: Essays in Honor of Harold Jeffreys 1–10. North-Holland, Amsterdam.