Statistical Science

On the history of maximum likelihood in relation to inverse probability and least squares

Anders Hald

Full-text: Open access


It is shown that the method of maximum likelihood occurs in rudimentary forms before Fisher [Messenger of Mathematics 41 (1912) 155–160], but not under this name. Some of the estimates called “most probable” would today have been called “most likely.” Gauss [Z. Astronom. Verwandte Wiss. 1 (1816) 185–196] used invariance under parameter transformation when deriving his estimate of the standard deviation in the normal case. Hagen [Grundzüge der Wahrschein­lichkeits­Rechnung, Dümmler, Berlin (1837)] used the maximum likelihood argument for deriving the frequentist version of the method of least squares for the linear normal model. Edgeworth [J. Roy. Statist. Soc. 72 (1909) 81–90] proved the asymptotic normality and optimality of the maximum likelihood estimate for a restricted class of distributions. Fisher had two aversions: noninvariance and unbiasedness. Replacing the posterior mode by the maximum likelihood estimate he achieved invariance, and using a two­stage method of maximum likelihood he avoided appealing to unbiasedness for the linear normal model.

Article information

Statist. Sci., Volume 14, Number 2 (1999), 214-222.

First available in Project Euclid: 24 December 2001

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Chauvenet confidence limits credible limits Edgeworth Encke Fisher Gauss Gosset Hagen invariance inverse probability Laplace least squares likelihood limits linear normal model maximum likelihood Merriman posterior mode reparameterization t­distribution two­stage maximum likelihood method unbiasedness


Hald, Anders. On the history of maximum likelihood in relation to inverse probability and least squares. Statist. Sci. 14 (1999), no. 2, 214--222. doi:10.1214/ss/1009212248.

Export citation


  • ALDRICH, J. 1997. R. A. Fisher and the making of maximum likelihood 1912 1922. Statist. Sci. 12 162 176. Z.
  • CHAUVENET, W. 1863. On the method of least squares. In A Manual of Spherical and Practical Astronomy 2 469 566. Z. Lippincott, Philadelphia. An appendix. Z.
  • EDGEWORTH, F. Y. 1883. The method of least squares. Philos. Z. Mag. 5 16 360 375. Z.
  • EDGEWORTH, F. Y. 1908. On the probable error of frequency constants. J. Roy. Statist. Soc. 71 381 397, 499 512, 651 678. Z.
  • EDGEWORTH, F. Y. 1909. Addendum on ``Probable errors of frequency constants.'' J. Roy. Statist. Soc. 72 81 90. Z.
  • EDWARDS, A. W. F. 1974. The history of likelihood. Internat. Statist. Rev. 42 9 15. Z.
  • EDWARDS, A. W. F. 1997. What did Fisher mean by ``inverse probability'' in 1912 1922? Statist. Sci. 12 177 184. ¨ Z.
  • ENCKE, J. F. 1832 1834. Uber die Methode der kleinsten Quadrate. In Berliner Astronomisches Jahrbuch fur 1834 ¨ 249 312; fur 1835 253 320; fur 1836 253 308. ¨ ¨ Z.
  • FISHER, R. A. 1912. On an absolute criterion for fitting fre quency curves. Messenger of Mathematics 41 155 160. ReZ. printed in Statist. Sci. 12 1997 39 41. Z.
  • FISHER, R. A. 1915. Frequency distribution of the values of the correlation coefficient in samples from an indefinitely large population. Biometrika 10 507 521. Z.
  • FISHER, R. A. 1921. On the ``probable error'' of a coefficient of correlation deduced from a small sample. Metron 1 3 32. Z.
  • FISHER, R. A. 1922a. On the mathematical foundations of theoretical statistics. Philos. Trans. Roy. Soc. London Ser. A 222 309 368. Z.
  • FISHER, R. A. 1922b. The goodness of fit of regression formulæ, and the distribution of regression coefficients. J. Roy. Statist. Soc. 85 597 612. Z.
  • GAUSS, C. F. 1809. Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium. Perthes et Besser, Hamburg.Z.
  • GAUSS, C. F. 1816. Bestimmung der Genauigkeit der Beobachtungen. Z. Astronom. Verwandte Wiss. 1 185 196. Z.
  • GAUSS, C. F. 1823. Theoria combinationis observationum erroribus minimis obnoxiae. Comm. Soc. Reg. Gottingensis Rec. 5 33 62, 63 90. Z.
  • HAGEN, G. 1837. Grundzuge der Wahrscheinlichkeits¨ Rechnung. Dummler, Berlin. ¨ Z.
  • HALD, A. 1998. A History of Mathematical Statistics from 1750 to 1930. Wiley, New York. Z.
  • HELMERT, F. R. 1876. Die Genauigkeit der Formel von Peters zur Berechnung des wahrscheinlichen Beobachtungsfehler direchter Beobachtungen gleicher Genauigkeit. Astronom. Nachr. 88 113 132. Z.
  • LAPLACE, P. S. DE 1812. Theorie Analytique des Probabilites. ´ ´ Courcier, Paris.
  • MERRIMAN, M. 1877. A list of writings relating to the method of least squares, with historical and critical notes. Trans. Conn. Acad. Arts Sci. 4 151 232. Z.
  • MERRIMAN, M. 1884. A Text-Book on the Method of Least Squares. Wiley, New York. References are to the 8th ed. Z. 1915. Z.
  • NEYMAN, J. AND SCOTT, E. L. 1948. Consistent estimates based on partially consistent observations. Econometrica 16 1 32. Z.
  • PEARSON, E. S. 1968. Some early correspondence between W. S. Gosset, R. A. Fisher and Karl Pearson, with notes and comments. Biometrika 55 445 457. Z.
  • PRATT, J. W. 1976. F. Y. Edgeworth and R. A. Fisher on the efficiency of maximum likelihood estimation. Ann. Statist. 4 501 514. Z.
  • SAVAGE, L. J. 1976. On rereading R. A. Fisher. Ann. Statist. 4 441 483. Z. ``STUDENT'' 1908. The probable error of a mean. Biometrika 6 1 25. Z.
  • TODHUNTER, I. 1865. A History of the Mathematical Theory of Probability from the Time of Pascal to That of Laplace. Macmillan, London.