It is shown that the method of maximum likelihood occurs in rudimentary forms before Fisher [Messenger of Mathematics 41 (1912) 155–160], but not under this name. Some of the estimates called “most probable” would today have been called “most likely.” Gauss [Z. Astronom. Verwandte Wiss. 1 (1816) 185–196] used invariance under parameter transformation when deriving his estimate of the standard deviation in the normal case. Hagen [Grundzüge der WahrscheinlichkeitsRechnung, Dümmler, Berlin (1837)] used the maximum likelihood argument for deriving the frequentist version of the method of least squares for the linear normal model. Edgeworth [J. Roy. Statist. Soc. 72 (1909) 81–90] proved the asymptotic normality and optimality of the maximum likelihood estimate for a restricted class of distributions. Fisher had two aversions: noninvariance and unbiasedness. Replacing the posterior mode by the maximum likelihood estimate he achieved invariance, and using a twostage method of maximum likelihood he avoided appealing to unbiasedness for the linear normal model.
"On the history of maximum likelihood in relation to inverse probability and least squares." Statist. Sci. 14 (2) 214 - 222, May 1999. https://doi.org/10.1214/ss/1009212248