The Epic Story of Maximum Likelihood



Statistical Science

The Epic Story of Maximum Likelihood

Stephen M. Stigler

Source: Statist. Sci. Volume 22, Number 4 (2007), 598-620.

Abstract

At a superficial level, the idea of maximum likelihood must be prehistoric: early hunters and gatherers may not have used the words “method of maximum likelihood” to describe their choice of where and how to hunt and gather, but it is hard to believe they would have been surprised if their method had been described in those terms. It seems a simple, even unassailable idea: Who would rise to argue in favor of a method of minimum likelihood, or even mediocre likelihood? And yet the mathematical history of the topic shows this “simple idea” is really anything but simple. Joseph Louis Lagrange, Daniel Bernoulli, Leonard Euler, Pierre Simon Laplace and Carl Friedrich Gauss are only some of those who explored the topic, not always in ways we would sanction today. In this article, that history is reviewed from back well before Fisher to the time of Lucien Le Cam’s dissertation. In the process Fisher’s unpublished 1930 characterization of conditions for the consistency and efficiency of maximum likelihood estimates is presented, and the mathematical basis of his three proofs discussed. In particular, Fisher’s derivation of the information inequality is seen to be derived from his work on the analysis of variance, and his later approach via estimating functions was derived from Euler’s Relation for homogeneous functions. The reaction to Fisher’s work is reviewed, and some lessons drawn.

Keywords: R. A. Fisher; Karl Pearson; Jerzy Neyman; Harold Hotelling; Abraham Wald; maximum likelihood; sufficiency; efficiency; superefficiency; history of statistics

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Alternatively, the document is available for a cost of $15. Select the "buy article" button below to purchase this document from a secured VeriSign, Inc. site.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1207580174
Digital Object Identifier: doi:10.1214/07-STS249
Mathematical Reviews number (MathSciNet): MR2410255

References

Aldrich, J. (1997). R. A. Fisher and the making of maximum likelihood 1912–1922. Statist. Sci. 12 162–176.
Mathematical Reviews (MathSciNet): MR1617519
Digital Object Identifier: doi:10.1214/ss/1030037906
Project Euclid: euclid.ss/1030037906
Arrow, K. J. and Lehmann, E. L. (2005). Harold Hotelling 1895–1973. Biographical Memoirs of the National Academy of Sciences 87 3–15.
Bahadur, R. R. (1964). On Fisher’s bound for asymptotic variances. Ann. Math. Statist. 35 1545–1552.
Mathematical Reviews (MathSciNet): MR166867
Digital Object Identifier: doi:10.1214/aoms/1177700378
Project Euclid: euclid.aoms/1177700378
Bahadur, R. R. (1983). Hodges superefficiency. In Encyclopedia of Statistical Sciences (S. Kotz and N. L. Johnson, eds.) 3 645–646.
Bennett, J. H., ed. (1990). Statistical Inference and Analysis: Selected Correspondence of R. A. Fisher. Clarendon Press, Oxford.
Mathematical Reviews (MathSciNet): MR1076366
Zentralblatt MATH: 0712.01007
Bernoulli, D. (1769). Dijudicatio maxime probabilis plurium observationum discrepantium atque verisimillima inductio inde formanda. Manuscript; Bernoulli MSS f.299–305, University of Basel. English translation in Stigler (1997).
Bernoulli, D. (1778). Dijudicatio maxime probabilis plurium observationum discrepantium atque verisimillima inductio inde formanda. Acta Academiae Scientiarum Imperialis Petropolitanae for 1777, pars prior 3–23. Reprinted in Bernoulli (1982). English translation in Kendall (1961) 3–13, reprinted 1970 in Pearson, Egon S. and Kendall, M. G. (eds.), Studies in the History of Statistics and Probability, pp. 157–167. Charles Griffin, London.
Bernoulli, D. (1982). Die Werke von Daniel Bernoulli. Band 2. Analysis. Wahrscheinlichkeitsrechnung. Birkhäuser, Basel.
Mathematical Reviews (MathSciNet): MR685593
Zentralblatt MATH: 0491.01008
Bickel, P. J. and Doksum, K. (2001). Mathematical Statistics. Basic Ideas and Selected Topics, 2nd ed. 1. Prentice Hall, Upper Saddle River, NJ.
Mathematical Reviews (MathSciNet): MR443141
Biometrics (1951). News and Notes. Biometrics 7 449–450.
Bowley, A. L. (1928). F. Y. Edgeworths Contributions to Mathematical Statistics. Royal Statistical Society, London. (Reprinted 1972 by Augustus M. Kelley, Clifton, NJ.)
Box, J. F. (1978). R. A. Fisher. The Life of a Scientist. Wiley, New York.
Mathematical Reviews (MathSciNet): MR500579
Courant, R. (1936). Differential and Integral Calculus. Nordeman, New York.
Zentralblatt MATH: 62.1165.04
Cox, D. R. (2006). Principles of Statistical Inference. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR2278763
Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Univ. Press.
Mathematical Reviews (MathSciNet): MR16588
Cramér, H. (1946a). A contribution to the theory of statistical estimation. Skand. Aktuarietidskr. 29 85–94. Reprinted in H. Cramér, Collected Works 2 948–957. Springer, Berlin (1994).
Mathematical Reviews (MathSciNet): MR17505
Darnell, A. C. (1988). Harold Hotelling 1895–1973. Statist. Sci. 3 57–62.
Doob, J. L. (1934). Probability and statistics. Trans. Amer. Math. Soc. 36 759–775.
Mathematical Reviews (MathSciNet): MR1501765
Digital Object Identifier: doi:10.2307/1989822
Doob, J. L. (1936). Statistical estimation. Trans. Amer. Math. Soc. 39 410–421.
Mathematical Reviews (MathSciNet): MR1501855
Digital Object Identifier: doi:10.2307/1989759
Dugué, D. (1937). Application des propriétés de la limite au sens du calcul des probabilités a l’étude de diverse questions d’estimation. J. lÉcole Polytechnique 3e série (n. 4) 305–373.
Edwards, A. W. F. (1974). The history of likelihood. Internat. Statist. Rev. 42 9–15.
Mathematical Reviews (MathSciNet): MR353514
Digital Object Identifier: doi:10.2307/1402681
Edwards, A. W. F. (1997). Three early papers on efficient parametric estimation. Statist. Sci. 12 35–47.
Mathematical Reviews (MathSciNet): MR1466429
Digital Object Identifier: doi:10.1214/ss/1029963260
Project Euclid: euclid.ss/1029963260
Edwards, A. W. F. (1997a). What did Fisher mean by “inverse probability” in 1912–1922? Statist. Sci. 12 177–184.
Mathematical Reviews (MathSciNet): MR1617520
Digital Object Identifier: doi:10.1214/ss/1030037907
Project Euclid: euclid.ss/1030037907
Efron, B. (1975). Defining the curvature of a statistical problem (with applications to second order efficiency). Ann. Statist. 3 1189–1242.
Mathematical Reviews (MathSciNet): MR428531
Digital Object Identifier: doi:10.1214/aos/1176343282
Project Euclid: euclid.aos/1176343282
Efron, B. (1978). The geometry of exponential families. Ann. Statist. 6 362–376.
Mathematical Reviews (MathSciNet): MR471152
Digital Object Identifier: doi:10.1214/aos/1176344130
Project Euclid: euclid.aos/1176344130
Efron, B. (1982). Maximum likelihood and decision theory (The 1981 Wald Memorial Lectures). Ann. Statist. 10 340–356.
Mathematical Reviews (MathSciNet): MR653516
Digital Object Identifier: doi:10.1214/aos/1176345778
Project Euclid: euclid.aos/1176345778
Efron, B. (1998). R. A. Fisher in the 21st century (with discussion). Statist. Sci. 13 95–122.
Mathematical Reviews (MathSciNet): MR1647499
Digital Object Identifier: doi:10.1214/ss/1028905930
Project Euclid: euclid.ss/1028905930
Efron, B. and Hinkley, D. V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher information. Biometrika 65 457–482.
Mathematical Reviews (MathSciNet): MR521817
Zentralblatt MATH: 0401.62002
Digital Object Identifier: doi:10.1093/biomet/65.3.457
Fienberg, S. E. and Hinkley, D. V. eds. (1980). R. A. Fisher: An Appreciation. Springer, New York.
Mathematical Reviews (MathSciNet): MR578886
Zentralblatt MATH: 0436.62002
Fisher, R. A. (1912). On an absolute criterion for fitting frequency curves. Messenger of Mathematics 41 155–160; reprinted as Paper 1 in Fisher (1974); reprinted in Edwards (1997).
Fisher, R. A. (1915). Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10 507–521; reprinted as Paper 4 in Fisher (1974).
Fisher, R. A. (1920). A mathematical examination of the methods of determining the accuracy of an observation by the mean error, and by the mean square error. Mon. Notices Roy. Astron. Soc. 80 758–770; reprinted as Paper 12 in Fisher (1974).
Fisher, R. A. (1922). On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. London Ser. A 222 309–368; reprinted as Paper 18 in Fisher (1974).
Fisher, R. A. (1922a). On the interpretation of χ2 from contingency tables, and the calculation of P. J. Roy. Statist. Soc. 85 87–94; reprinted as Paper 19 in Fisher (1974).
Fisher, R. A. (1924). The Influence of Rainfall on the Yield of Wheat at Rothamsted. Philos. Trans. Roy. Soc. London Ser. B 213 89–142; reprinted as Paper 37 in Fisher (1974).
Fisher, R. A. (1924a). Conditions under which χ2 measures the discrepancy between observation and hypothesis. J. Roy. Statist. Soc. 87 442–450; reprinted as Paper 34 in Fisher (1974).
Fisher, R. A. (1925). Theory of statistical estimation. Proc. Cambridge Philos. Soc. 22 700–725; reprinted as Paper 42 in Fisher (1974).
Fisher, R. A. (1931). Letter to the Editor. Amer. Math. Monthly 38 335–338.
Fisher, R. A. (1935). The logic of inductive inference. J. Roy. Statist. Soc. 98 39–54; reprinted as Paper 124 in Fisher (1974).
Fisher, R. A. (1938). Statistical Theory of Estimation. Univ. Calcutta.
Fisher, R. A. (1938–1939). Review of “Lectures and Conferences on Mathematical Statistics” by J. Neyman. Science Progress 33 577.
Fisher, R. A. (1950). Contributions to Mathematical Statistics. Wiley, New York.
Mathematical Reviews (MathSciNet): MR38601
Zentralblatt MATH: 0040.36201
Fisher, R. A. (1956). Statistical Methods and Scientific Inference. Oliver and Boyd, Edinburgh.
Zentralblatt MATH: 0070.36903
Fisher, R. A. (1974). The Collected Papers of R. A. Fisher U. of Adelaide Press.
Mathematical Reviews (MathSciNet): MR505093
Galton, F. (1908). Memories of my Life. Methuen, London.
Gauss, C. F. (1809). Theoria Motus Corporum Coelestium. Perthes et Besser, Hamburg. Translated, 1857, as Theory of Motion of the Heavenly Bodies Moving about the Sun in Conic Sections, trans. C. H. Davis. Little, Brown; Boston. Reprinted, 1963, Dover, New York.
Grove, C. C. (1930). Review of “Statistical Methods for Research Workers.” Amer. Math. Monthly 37 547–550.
Hald, A. (1998). A History of Mathematical Statistics from 1750 to 1930. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1619032
Zentralblatt MATH: 0979.01012
Hald, A. (2007). A History of Parametric Statistical Inference from Bernoulli to Fisher, 1713 to 1935. Springer, New York.
Mathematical Reviews (MathSciNet): MR2284212
Hinkley, D. V. (1980). Theory of statistical estimation: The 1925 paper. Pp. 85–94 in Fienberg and Hinkley (1980).
Hotelling, H. (1930). The consistency and ultimate distribution of optimum statistics. Trans. Amer. Math. Soc. 32 847–859.
Mathematical Reviews (MathSciNet): MR1501565
Digital Object Identifier: doi:10.2307/1989353
Hotelling, H. (1930a). Spaces of statistical parameters (Abstract). Bull. Amer. Math. Soc. 36 191.
Hotelling, H. (1951). The impact of R. A. Fisher on statistics. J. Amer. Statist. Assoc. 46 35–46.
Hotelling, H. (1990). The Collected Economic Articles of Harold Hotelling. Springer, New York.
Mathematical Reviews (MathSciNet): MR1030045
Jeffreys, H. (1946). An invariant form for the prior probability in estimation problems. Proc. Roy. Soc. London Ser. A 186 453–461.
Mathematical Reviews (MathSciNet): MR17504
Digital Object Identifier: doi:10.1098/rspa.1946.0056
Kass, R. E. (1989). The geometry of asymptotic inference. Statist. Sci. 4 188–219.
Mathematical Reviews (MathSciNet): MR1015274
Digital Object Identifier: doi:10.1214/ss/1177012480
Project Euclid: euclid.ss/1177012480
Kass, R. E. and Vos, P. W. (1997). Geometrical Foundations of Asymptotic Inference. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1461540
Zentralblatt MATH: 0880.62005
Kendall, M. G. (1961). Daniel Bernoulli on maximum likelihood. Biometrika 48 1–18. Reprinted in 1970 in Pearson, Egon S. and Kendall, M. G. (eds.), Studies in the History of Statistics and Probability. Charles Griffin, London, pages 155–172.
Mathematical Reviews (MathSciNet): MR124989
Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many parameters. Ann. Math. Statist. 27 887–906.
Kruskal, W. H. (1980). The significance of Fisher: A review of “R. A. Fisher: The Life of a Scientist” by Joan Fisher Box. J. Amer. Statist. Assoc. 75 1019–1030.
Lagrange, J.-L. (1776). Mémoire sur l’utilité de la méthode de prendre le milieu entre les résultats de plusieurs observations; dans lequel on examine les avantages de cette méthode par le calcul d es probabilités, & ou l’on resoud differens problèmes relatifs à cette matière. Miscellanea Taurinensia 5 167–232. Reprinted in Lagrange (1868) 2 173–236.
Lagrange, J.-L. (1868). Oeuvres de Lagrange, 2. Gauthier-Villars, Paris.
Lambert, J. H. (1760). Photometria, sive de Mensura et Gradibus Luminis, Colorum et Umbrae. Detleffsen, Augsburg. (French translation 1997, L’Harmattan, Paris; English translation 2001, by David L. DiLaura, for The Illuminating Engineering Society of North America).
Laplace, P. S. (1774). Mémoire sur la probabilité des causes par les évènemens. Mémoires de mathématique et de physique, presentés à lAcadémie Royale des Sciences, par divers savans, & lû dans ses assemblées 6 621–656. Translated in Stigler (1986a).
Lauritzen, S. L. (2002). Thiele: Pioneer in Statistics. Oxford Univ. Press.
Mathematical Reviews (MathSciNet): MR2055773
Zentralblatt MATH: 1027.01013
Le Cam, L. (1953). On some asymptotic properties of maximum likelihood estimates and relates Bayes estimates. University of California Publications in Statistics 1 277–330.
Mathematical Reviews (MathSciNet): MR54913
Le Cam, L. (1990). Maximum likelihood: An introduction. Internat. Statist. Rev. 58 153–171 [Previously issued in 1979 by the Statistics Branch of the Department of Mathematics, University of Maryland, as Lecture Notes No. 18].
Littauer, S. B. and Mode, E. B. (1952). Report of the Boston Meeting of the Institute. Ann. Math. Statist. 23 155–159.
Neyman, J. (1937). Outline of a theory of statistical estimation based upon the classical theory of probability. Phil. Trans. Royal Soc. London Ser. A 236 333–380.
Neyman, J. (1938). Lectures and Conferences on Mathematical Statistics (edited by W. Edwards Deming). The Graduate School of the USDA, Washington DC.
Neyman, J. and Scott, E. L. (1948). Consistent estimates based on partially consistent observations. Econometrica 16 1–32.
Mathematical Reviews (MathSciNet): MR25113
Digital Object Identifier: doi:10.2307/1914288
Neyman, J. (1951). Review of R. A. Fisher “Contributions to Mathematical Statistics.” The Scientific Monthly 72 406–408.
Norden, R. H. (1972–1973). A survey of maximum likelihood estimation. Internat. Statist. Rev. 40 329–354, 41 39–58.
Pearson, K. (1896). Mathematical contributions to the theory of evolution, III: regression, heredity and panmixia. Philos. Trans. Roy. Soc. London Ser. A 187 253–318. Reprinted in Karl Pearsons Early Statistical Papers, Cambridge: Cambridge University Press, 1956, pp. 113–178.
Pearson, K. and Filon, L. N. G. (1898). Mathematical contributions to the theory of evolution IV. On the probable errors of frequency constants and on the influence of random selection on variation and correlation. Philos. Trans. Roy. Soc. London Ser. A 191 229–311. Reprinted in Karl Pearsons Early Statistical Papers, Cambridge: Cambridge University Press, 1956, pp. 179–261.
Porter, T. M. (2004). Karl Pearson: The Scientific Life in a Statistical Age. Princeton Univ. Press.
Mathematical Reviews (MathSciNet): MR2054951
Zentralblatt MATH: 1069.62001
Pratt, J. W. (1976). F. Y. Edgeworth and R. A. Fisher on the efficiency of maximum likelihood estimation. Ann. Statist. 4 501–514.
Mathematical Reviews (MathSciNet): MR415867
Digital Object Identifier: doi:10.1214/aos/1176343457
Project Euclid: euclid.aos/1176343457
Rao, C. R. (1961). Asymptotic efficiency and limiting information. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1 531–546. Univ. California Press, Berkeley.
Mathematical Reviews (MathSciNet): MR133192
Zentralblatt MATH: 0156.39802
Rao, C. R. (1962). Efficient estimates and optimum inference procedures in large samples, with discussion. J. Roy. Statist. Soc. Ser. B 24 46–72.
Mathematical Reviews (MathSciNet): MR293766
Savage, L. J. (1976). On rereading R. A. Fisher. Ann. Statist. 4 441–500.
Mathematical Reviews (MathSciNet): MR403889
Digital Object Identifier: doi:10.1214/aos/1176343456
Project Euclid: euclid.aos/1176343456
Sheynin, O. B. (1971). J. H. Lambert’s work on probability. Archive for History of Exact Sciences 7 244–256.
Smith, K. (1916). On the ‘best’ values of the constants in frequency distributions. Biometrika 11 262–276.
Smith, W. L. (1978). Harold Hotelling 1985–1973. Ann. Statist. 6 1173–1183.
Mathematical Reviews (MathSciNet): MR523758
Digital Object Identifier: doi:10.1214/aos/1176344369
Stigler, S. M. (1973). Laplace, Fisher, and the discovery of the concept of sufficiency. Biometrika 60 439–445. Reprinted in 1977 in Kendall, Maurice G. and Robin L. Plackett, eds., Studies in the History of Statistics and Probability, Vol. 2. Griffin, London, pp. 271–277.
Mathematical Reviews (MathSciNet): MR326872
Stigler, S. M. (1986). The History of Statistics: The Measurement of Uncertainty Before 1900. Harvard Univ. Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR852410
Zentralblatt MATH: 0656.62005
Stigler, S. M. (1986a). Laplace’s 1774 memoir on inverse probability. Statist. Sci. 1 359–378.
Mathematical Reviews (MathSciNet): MR858515
Digital Object Identifier: doi:10.1214/ss/1177013620
Project Euclid: euclid.ss/1177013620
Stigler, S. M. (1997). Daniel Bernoulli, Leonhard Euler, and Maximum Likelihood. In Festschrift for Lucien LeCam (D. Pollard, E. Torgersen and G. Yang, eds.) 345–367. Springer, New York. Extensively revised and reprinted as Chapter 16 of Stigler (1999).
Mathematical Reviews (MathSciNet): MR1462957
Zentralblatt MATH: 0884.01015
Stigler, S. M. (1999). Statistics on the Table. Harvard Univ. Press, Cambridge, MA.
Mathematical Reviews (MathSciNet): MR1712969
Zentralblatt MATH: 0997.62506
Stigler, S. M. (1999a). The Foundations of Statistics at Stanford. Amer. Statist. 53 263–266.
Mathematical Reviews (MathSciNet): MR1711551
Digital Object Identifier: doi:10.2307/2686107
Stigler, S. M. (2001). Ancillary history. In State of the Art in Probability and Statistics (C. M. de Gunst, C. A. J. Klaassen and A. W. van der Vaart, eds.). IMS Lecture Notes Monogr. Ser. 36 555–567. IMS, Beachwood, OH.
Mathematical Reviews (MathSciNet): MR1836581
Digital Object Identifier: doi:10.1214/lnms/1215090089
Stigler, S. M. (2005). Fisher in 1921. Statist. Sci. 20 32–49.
Mathematical Reviews (MathSciNet): MR2182986
Digital Object Identifier: doi:10.1214/088342305000000025
Project Euclid: euclid.ss/1118065041
Stigler, S. M. (2007). Karl Pearson’s theoretical errors and the advances they inspired. To appear.
van der Vaart, A. W. (1997). Superefficiency. In Festschrift for Lucien Le Cam (D. Pollard, E. Torgersen and G. L. Yang, eds.) 397–410. Springer, New York.
Mathematical Reviews (MathSciNet): MR1462961
Zentralblatt MATH: 0897.62025
van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1652247
Zentralblatt MATH: 0910.62001
Wald, A. (1940). The fitting of straight lines if both variables are subject to error. Ann. Math. Statist. 11 284–300. [A summary of the main results of this article, as presented in a talk July 6, 1939, was published pp. 25–28 in Report of the Fifth Annual Research Conference on Economics and Statistics Held at Colorado Springs July 3 to 28, 1939, Cowles Commission, University of Chicago, 1939.]
Mathematical Reviews (MathSciNet): MR2739
Wald, A. (1943). Tests of statistical hypotheses concerning several parameters when the number of observations is large. Trans. Amer. Math. Soc. 54 426–482.
Mathematical Reviews (MathSciNet): MR12401
Digital Object Identifier: doi:10.2307/1990256
Wald, A. (1949). Note on the consistency of the maximum likelihood estimate. Ann. Math. Statist. 20 595–601.
Mathematical Reviews (MathSciNet): MR32169
Digital Object Identifier: doi:10.1214/aoms/1177729952
Project Euclid: euclid.aoms/1177729952
Yule, G. U. (1936). An Introduction to the Theory of Statistics, 10th ed. Charles Griffin, London. [This was the last edition revised by Yule himself; subsequent revisions from 1937 by M. G. Kendall were not greatly changed in emphasis.]
Zabell, S. L. (1992). R. A. Fisher and the fiducial argument. Statist. Sci. 7 369–387. Reprinted in 2005 in S. L. Zabell, Symmetry and its Discontents: Essays on the History of Inductive Philosophy. Cambridge Univ. Press.
Mathematical Reviews (MathSciNet): MR1181418
Digital Object Identifier: doi:10.1214/ss/1177011233
Project Euclid: euclid.ss/1177011233

2008 © Institute of Mathematical Statistics