Statistical Science

The Epic Story of Maximum Likelihood

Stephen M. Stigler

Full-text: Open access

Abstract

At a superficial level, the idea of maximum likelihood must be prehistoric: early hunters and gatherers may not have used the words “method of maximum likelihood” to describe their choice of where and how to hunt and gather, but it is hard to believe they would have been surprised if their method had been described in those terms. It seems a simple, even unassailable idea: Who would rise to argue in favor of a method of minimum likelihood, or even mediocre likelihood? And yet the mathematical history of the topic shows this “simple idea” is really anything but simple. Joseph Louis Lagrange, Daniel Bernoulli, Leonard Euler, Pierre Simon Laplace and Carl Friedrich Gauss are only some of those who explored the topic, not always in ways we would sanction today. In this article, that history is reviewed from back well before Fisher to the time of Lucien Le Cam’s dissertation. In the process Fisher’s unpublished 1930 characterization of conditions for the consistency and efficiency of maximum likelihood estimates is presented, and the mathematical basis of his three proofs discussed. In particular, Fisher’s derivation of the information inequality is seen to be derived from his work on the analysis of variance, and his later approach via estimating functions was derived from Euler’s Relation for homogeneous functions. The reaction to Fisher’s work is reviewed, and some lessons drawn.

Article information

Source
Statist. Sci. Volume 22, Number 4 (2007), 598-620.

Dates
First available: 7 April 2008

Permanent link to this document
http://projecteuclid.org/euclid.ss/1207580174

Digital Object Identifier
doi:10.1214/07-STS249

Mathematical Reviews number (MathSciNet)
MR2410255

Zentralblatt MATH identifier
06075147

Citation

Stigler, Stephen M. The Epic Story of Maximum Likelihood. Statistical Science 22 (2007), no. 4, 598--620. doi:10.1214/07-STS249. http://projecteuclid.org/euclid.ss/1207580174.


Export citation

References

  • Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912–1922. Statist. Sci. 12 162–176.
  • Arrow, K. J. and Lehmann, E. L. (2005). Harold Hotelling 1895–1973. Biographical Memoirs of the National Academy of Sciences 87 3–15.
  • Bahadur, R. R. (1964). On Fisher’s bound for asymptotic variances. Ann. Math. Statist. 35 1545–1552.
  • Bahadur, R. R. (1983). Hodges superefficiency. In Encyclopedia of Statistical Sciences (S. Kotz and N. L. Johnson, eds.) 3 645–646.
  • Bennett, J. H., ed. (1990). Statistical Inference and Analysis: Selected Correspondence of R. A. Fisher. Clarendon Press, Oxford.
  • Bernoulli, D. (1769). Dijudicatio maxime probabilis plurium observationum discrepantium atque verisimillima inductio inde formanda. Manuscript; Bernoulli MSS f.299–305, University of Basel. English translation in Stigler (1997).
  • Bernoulli, D. (1778). Dijudicatio maxime probabilis plurium observationum discrepantium atque verisimillima inductio inde formanda. Acta Academiae Scientiarum Imperialis Petropolitanae for 1777, pars prior 3–23. Reprinted in Bernoulli (1982). English translation in Kendall (1961) 3–13, reprinted 1970 in Pearson, Egon S. and Kendall, M. G. (eds.), Studies in the History of Statistics and Probability, pp. 157–167. Charles Griffin, London.
  • Bernoulli, D. (1982). Die Werke von Daniel Bernoulli. Band 2. Analysis. Wahrscheinlichkeitsrechnung. Birkhäuser, Basel.
  • Bickel, P. J. and Doksum, K. (2001). Mathematical Statistics. Basic Ideas and Selected Topics, 2nd ed. 1. Prentice Hall, Upper Saddle River, NJ.
  • Biometrics (1951). News and Notes. Biometrics 7 449–450.
  • Bowley, A. L. (1928). F. Y. Edgeworths Contributions to Mathematical Statistics. Royal Statistical Society, London. (Reprinted 1972 by Augustus M. Kelley, Clifton, NJ.)
  • Box, J. F. (1978). R. A. Fisher. The Life of a Scientist. Wiley, New York.
  • Courant, R. (1936). Differential and Integral Calculus. Nordeman, New York.
  • Cox, D. R. (2006). Principles of Statistical Inference. Cambridge Univ. Press.
  • Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press.
  • Cramér, H. (1946a). A contribution to the theory of statistical estimation. Skand. Aktuarietidskr. 29 85–94. Reprinted in H. Cramér, Collected Works 2 948–957. Springer, Berlin (1994).
  • Darnell, A. C. (1988). Harold Hotelling 1895–1973. Statist. Sci. 3 57–62.
  • Doob, J. L. (1934). Probability and statistics. Trans. Amer. Math. Soc. 36 759–775.
  • Doob, J. L. (1936). Statistical estimation. Trans. Amer. Math. Soc. 39 410–421.
  • Dugué, D. (1937). Application des propriétés de la limite au sens du calcul des probabilités a l’étude de diverse questions d’estimation. J. lÉcole Polytechnique 3e série (n. 4) 305–373.
  • Edwards, A. W. F. (1974). The history of likelihood. Internat. Statist. Rev. 42 9–15.
  • Edwards, A. W. F. (1997). Three early papers on efficient parametric estimation. Statist. Sci. 12 35–47.
  • Edwards, A. W. F. (1997a). What did Fisher mean by “inverse probability” in 1912–1922? Statist. Sci. 12 177–184.
  • Efron, B. (1975). Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Statist. 3 1189–1242.
  • Efron, B. (1978). The geometry of exponential families. Ann. Statist. 6 362–376.
  • Efron, B. (1982). Maximum likelihood and decision theory (The 1981 Wald Memorial Lectures). Ann. Statist. 10 340–356.
  • Efron, B. (1998). R. A. Fisher in the 21st century (with discussion). Statist. Sci. 13 95–122.
  • Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information. Biometrika 65 457–482.
  • Fienberg, S. E. and Hinkley, D. V. eds. (1980). R. A. Fisher: An Appreciation. Springer, New York.
  • Fisher, R. A. (1912). On an absolute criterion for fitting frequency curves. Messenger of Mathematics 41 155–160; reprinted as Paper 1 in Fisher (1974); reprinted in Edwards (1997).
  • Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10 507–521; reprinted as Paper 4 in Fisher (1974).
  • Fisher, R. A. (1920). A mathematical examination of the methods of determining the accuracy of an observation by the mean error, and by the mean square error. Mon. Notices Roy. Astron. Soc. 80 758–770; reprinted as Paper 12 in Fisher (1974).
  • Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. London Ser. A 222 309–368; reprinted as Paper 18 in Fisher (1974).
  • Fisher, R. A. (1922a). On the interpretation of χ2 from contingency tables, and the calculation of P. J. Roy. Statist. Soc. 85 87–94; reprinted as Paper 19 in Fisher (1974).
  • Fisher, R. A. (1924). The Influence of Rainfall on the Yield of Wheat at Rothamsted. Philos. Trans. Roy. Soc. London Ser. B 213 89–142; reprinted as Paper 37 in Fisher (1974).
  • Fisher, R. A. (1924a). Conditions under which χ2 measures the discrepancy between observation and hypothesis. J. Roy. Statist. Soc. 87 442–450; reprinted as Paper 34 in Fisher (1974).
  • Fisher, R. A. (1925). Theory of statistical estimation. Proc. Cambridge Philos. Soc. 22 700–725; reprinted as Paper 42 in Fisher (1974).
  • Fisher, R. A. (1931). Letter to the Editor. Amer. Math. Monthly 38 335–338.
  • Fisher, R. A. (1935). The logic of inductive inference. J. Roy. Statist. Soc. 98 39–54; reprinted as Paper 124 in Fisher (1974).
  • Fisher, R. A. (1938). Statistical Theory of Estimation. Univ. Calcutta.
  • Fisher, R. A. (1938–1939). Review of “Lectures and Conferences on Mathematical Statistics” by J. Neyman. Science Progress 33 577.
  • Fisher, R. A. (1950). Contributions to Mathematical Statistics. Wiley, New York.
  • Fisher, R. A. (1956). Statistical Methods and Scientific Inference. Oliver and Boyd, Edinburgh.
  • Fisher, R. A. (1974). The Collected Papers of R. A. Fisher U. of Adelaide Press.
  • Galton, F. (1908). Memories of my Life. Methuen, London.
  • Gauss, C. F. (1809). Theoria Motus Corporum Coelestium. Perthes et Besser, Hamburg. Translated, 1857, as Theory of Motion of the Heavenly Bodies Moving about the Sun in Conic Sections, trans. C. H. Davis. Little, Brown; Boston. Reprinted, 1963, Dover, New York.
  • Grove, C. C. (1930). Review of “Statistical Methods for Research Workers.” Amer. Math. Monthly 37 547–550.
  • Hald, A. (1998). A History of Mathematical Statistics from 1750 to 1930. Wiley, New York.
  • Hald, A. (2007). A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713 to 1935. Springer, New York.
  • Hinkley, D. V. (1980). Theory of statistical estimation: The 1925 paper. Pp. 85–94 in Fienberg and Hinkley (1980).
  • Hotelling, H. (1930). The consistency and ultimate distribution of optimum statistics. Trans. Amer. Math. Soc. 32 847–859.
  • Hotelling, H. (1930a). Spaces of statistical parameters (Abstract). Bull. Amer. Math. Soc. 36 191.
  • Hotelling, H. (1951). The impact of R. A. Fisher on statistics. J. Amer. Statist. Assoc. 46 35–46.
  • Hotelling, H. (1990). The Collected Economic Articles of Harold Hotelling. Springer, New York.
  • Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proc. Roy. Soc. London Ser. A 186 453–461.
  • Kass, R. E. (1989). The geometry of asymptotic inference. Statist. Sci. 4 188–219.
  • Kass, R. E. and Vos, P. W. (1997). Geometrical Foundations of Asymptotic Inference. Wiley, New York.
  • Kendall, M. G. (1961). Daniel Bernoulli on maximum likelihood. Biometrika 48 1–18. Reprinted in 1970 in Pearson, Egon S. and Kendall, M. G. (eds.), Studies in the History of Statistics and Probability. Charles Griffin, London, pages 155–172.
  • Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many parameters. Ann. Math. Statist. 27 887–906.
  • Kruskal, W. H. (1980). The significance of Fisher: A review of “R. A. Fisher: The Life of a Scientist” by Joan Fisher Box. J. Amer. Statist. Assoc. 75 1019–1030.
  • Lagrange, J.-L. (1776). Mémoire sur l’utilité de la méthode de prendre le milieu entre les résultats de plusieurs observations; dans lequel on examine les avantages de cette méthode par le calcul d es probabilités, & ou l’on resoud differens problèmes relatifs à cette matière. Miscellanea Taurinensia 5 167–232. Reprinted in Lagrange (1868) 2 173–236.
  • Lagrange, J.-L. (1868). Oeuvres de Lagrange, 2. Gauthier-Villars, Paris.
  • Lambert, J. H. (1760). Photometria, sive de Mensura et Gradibus Luminis, Colorum et Umbrae. Detleffsen, Augsburg. (French translation 1997, L’Harmattan, Paris; English translation 2001, by David L. DiLaura, for The Illuminating Engineering Society of North America).
  • Laplace, P. S. (1774). Mémoire sur la probabilité des causes par les évènemens. Mémoires de mathématique et de physique, presentés à lAcadémie Royale des Sciences, par divers savans, & lû dans ses assemblées 6 621–656. Translated in Stigler (1986a).
  • Lauritzen, S. L. (2002). Thiele: Pioneer in Statistics. Oxford Univ. Press.
  • Le Cam, L. (1953). On some asymptotic properties of maximum likelihood estimates and relates Bayes estimates. University of California Publications in Statistics 1 277–330.
  • Le Cam, L. (1990). Maximum likelihood: An introduction. Internat. Statist. Rev. 58 153–171 [Previously issued in 1979 by the Statistics Branch of the Department of Mathematics, University of Maryland, as Lecture Notes No. 18].
  • Littauer, S. B. and Mode, E. B. (1952). Report of the Boston Meeting of the Institute. Ann. Math. Statist. 23 155–159.
  • Neyman, J. (1937). Outline of a theory of statistical estimation based upon the classical theory of probability. Phil. Trans. Royal Soc. London Ser. A 236 333–380.
  • Neyman, J. (1938). Lectures and Conferences on Mathematical Statistics (edited by W. Edwards Deming). The Graduate School of the USDA, Washington DC.
  • Neyman, J. and Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica 16 1–32.
  • Neyman, J. (1951). Review of R. A. Fisher “Contributions to Mathematical Statistics.” The Scientific Monthly 72 406–408.
  • Norden, R. H. (1972–1973). A survey of maximum likelihood estimation. Internat. Statist. Rev. 40 329–354, 41 39–58.
  • Pearson, K. (1896). Mathematical contributions to the theory of evolution, III: regression, heredity and panmixia. Philos. Trans. Roy. Soc. London Ser. A 187 253–318. Reprinted in Karl Pearsons Early Statistical Papers, Cambridge: Cambridge University Press, 1956, pp. 113–178.
  • Pearson, K. and Filon, L. N. G. (1898). Mathematical contributions to the theory of evolution IV. On the probable errors of frequency constants and on the influence of random selection on variation and correlation. Philos. Trans. Roy. Soc. London Ser. A 191 229–311. Reprinted in Karl Pearsons Early Statistical Papers, Cambridge: Cambridge University Press, 1956, pp. 179–261.
  • Porter, T. M. (2004). Karl Pearson: The Scientific Life in a Statistical Age. Princeton Univ. Press.
  • Pratt, J. W. (1976). F. Y. Edgeworth and R. A. Fisher on the efficiency of maximum likelihood estimation. Ann. Statist. 4 501–514.
  • Rao, C. R. (1961). Asymptotic efficiency and limiting information. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1 531–546. Univ. California Press, Berkeley.
  • Rao, C. R. (1962). Efficient estimates and optimum inference procedures in large samples, with discussion. J. Roy. Statist. Soc. Ser. B 24 46–72.
  • Savage, L. J. (1976). On rereading R. A. Fisher. Ann. Statist. 4 441–500.
  • Sheynin, O. B. (1971). J. H. Lambert’s work on probability. Archive for History of Exact Sciences 7 244–256.
  • Smith, K. (1916). On the ‘best’ values of the constants in frequency distributions. Biometrika 11 262–276.
  • Smith, W. L. (1978). Harold Hotelling 1985–1973. Ann. Statist. 6 1173–1183.
  • Stigler, S. M. (1973). Laplace, Fisher, and the discovery of the concept of sufficiency. Biometrika 60 439–445. Reprinted in 1977 in Kendall, Maurice G. and Robin L. Plackett, eds., Studies in the History of Statistics and Probability, Vol. 2. Griffin, London, pp. 271–277.
  • Stigler, S. M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard Univ. Press, Cambridge, MA.
  • Stigler, S. M. (1986a). Laplace’s 1774 memoir on inverse probability. Statist. Sci. 1 359–378.
  • Stigler, S. M. (1997). Daniel Bernoulli, Leonhard Euler, and Maximum Likelihood. In Festschrift for Lucien LeCam (D. Pollard, E. Torgersen and G. Yang, eds.) 345–367. Springer, New York. Extensively revised and reprinted as Chapter 16 of Stigler (1999).
  • Stigler, S. M. (1999). Statistics on the Table. Harvard Univ. Press, Cambridge, MA.
  • Stigler, S. M. (1999a). The Foundations of Statistics at Stanford. Amer. Statist. 53 263–266.
  • Stigler, S. M. (2001). Ancillary history. In State of the Art in Probability and Statistics (C. M. de Gunst, C. A. J. Klaassen and A. W. van der Vaart, eds.). IMS Lecture Notes Monogr. Ser. 36 555–567. IMS, Beachwood, OH.
  • Stigler, S. M. (2005). Fisher in 1921. Statist. Sci. 20 32–49.
  • Stigler, S. M. (2007). Karl Pearson’s theoretical errors and the advances they inspired. To appear.
  • van der Vaart, A. W. (1997). Superefficiency. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G. L. Yang, eds.) 397–410. Springer, New York.
  • van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press.
  • Wald, A. (1940). The fitting of straight lines if both variables are subject to error. Ann. Math. Statist. 11 284–300. [A summary of the main results of this article, as presented in a talk July 6, 1939, was published pp. 25–28 in Report of the Fifth Annual Research Conference on Economics and Statistics Held at Colorado Springs July 3 to 28, 1939, Cowles Commission, University of Chicago, 1939.]
  • Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Amer. Math. Soc. 54 426–482.
  • Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20 595–601.
  • Yule, G. U. (1936). An Introduction to the Theory of Statistics, 10th ed. Charles Griffin, London. [This was the last edition revised by Yule himself; subsequent revisions from 1937 by M. G. Kendall were not greatly changed in emphasis.]
  • Zabell, S. L. (1992). R. A. Fisher and the fiducial argument. Statist. Sci. 7 369–387. Reprinted in 2005 in S. L. Zabell, Symmetry and its Discontents: Essays on the History of Inductive Philosophy. Cambridge Univ. Press.