• Bernoulli
  • Volume 20, Number 2 (2014), 775-802.

Maximum likelihood characterization of distributions

Mitia Duerinckx, Christophe Ley, and Yvik Swan

Full-text: Open access


A famous characterization theorem due to C.F. Gauss states that the maximum likelihood estimator (MLE) of the parameter in a location family is the sample mean for all samples of all sample sizes if and only if the family is Gaussian. There exist many extensions of this result in diverse directions, most of them focussing on location and scale families. In this paper, we propose a unified treatment of this literature by providing general MLE characterization theorems for one-parameter group families (with particular attention on location and scale parameters). In doing so, we provide tools for determining whether or not a given such family is MLE-characterizable, and, in case it is, we define the fundamental concept of minimal necessary sample size at which a given characterization holds. Many of the cornerstone references on this topic are retrieved and discussed in the light of our findings, and several new characterization theorems are provided. Of particular interest is that one part of our work, namely the introduction of so-called equivalence classes for MLE characterizations, is a modernized version of Daniel Bernoulli’s viewpoint on maximum likelihood estimation.

Article information

Bernoulli, Volume 20, Number 2 (2014), 775-802.

First available in Project Euclid: 28 February 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

location parameter maximum likelihood estimator minimal necessary sample size one-parameter group family scale parameter score function


Duerinckx, Mitia; Ley, Christophe; Swan, Yvik. Maximum likelihood characterization of distributions. Bernoulli 20 (2014), no. 2, 775--802. doi:10.3150/13-BEJ506.

Export citation


  • [1] Aczél, J. and Dhombres, J. (1989). Functional Equations in Several Variables with Applications to Mathematics, Information Theory and to the Natural and Social Sciences. Encyclopedia of Mathematics and Its Applications 31. Cambridge: Cambridge Univ. Press.
  • [2] Akaike, H. (1977). On entropy maximization principle. In Applications of Statistics (P.R. Krishnaiah, ed.) 27–41. Amsterdam, The Netherlands: North-Holland.
  • [3] Akaike, H. (1978). A Bayesian analysis of the minimum AIC procedure. Ann. Inst. Statist. Math. 30 9–14.
  • [4] Azzalini, A. and Genton, M.G. (2007). On Gauss’s characterization of the normal distribution. Bernoulli 13 169–174.
  • [5] Bondesson, L. (1997). A generalization of Poincaré’s characterization of exponential families. J. Statist. Plann. Inference 63 147–155.
  • [6] Bourguin, S. and Tudor, C.A. (2011). Cramér theorem for gamma random variables. Electron. Commun. Probab. 16 365–378.
  • [7] Buczolich, Z. and Székely, G.J. (1989). When is a weighted average of ordered sample elements a maximum likelihood estimator of the location parameter? Adv. in Appl. Math. 10 439–456.
  • [8] Campbell, L.L. (1970). Equivalence of Gauss’s principle and minimum discrimination information estimation of probabilities. Ann. Math. Statist. 41 1011–1015.
  • [9] Chatterjee, S.K. (2003). Statistical Thought: A Perspective and History. Oxford: Oxford Univ. Press.
  • [10] Chen, L.H.Y. (1975). Poisson approximation for dependent trials. Ann. Probab. 3 534–545.
  • [11] Chen, L.H.Y., Goldstein, L. and Shao, Q.M. (2010). Normal Approximation by Stein’s Method. Springer Series in Probability and Its Applications. New York: Springer.
  • [12] Cover, T.M. and Thomas, J.A. (2006). Elements of Information Theory, 2nd ed. Hoboken, NJ: Wiley.
  • [13] Cramér, H. (1936). Über eine Eigenschaft der Normalen Verteilungsfunktion. Math. Z. 41 405–414.
  • [14] Cramér, H. (1946). A contribution to the theory of statistical estimation. Skand. Aktuarietidskr. 29 85–94.
  • [15] Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Mathematical Series 9. Princeton, NJ: Princeton Univ. Press.
  • [16] Duerinckx, M. and Ley, C. (2012). Maximum likelihood characterization of rotationally symmetric distributions on the sphere. Sankhyā Ser. A 74 249–262.
  • [17] Ferguson, T.S. (1962). Location and scale parameters in exponential families of distributions. Ann. Math. Statist. 33 986–1001.
  • [18] Findeisen, P. (1982). Die Charakterisierung der Normalverteilung nach Gauß. Metrika 29 55–63.
  • [19] Galambos, J. (1972). Characterization of certain populations by independence of order statistics. J. Appl. Probab. 9 224–230.
  • [20] Gauss, C.F. (1809). Theoria Motus Corporum Coelestium in Sectionibus Conicis Solem Ambientium. Cambridge Library Collection. Cambridge: Cambridge Univ. Press. Reprint of the 1809 original.
  • [21] Ghosh, J.K. and Rao, C.R. (1971). A note on some translation parameter families of densities for which the median is an m.l.e. Sankhyā Ser. A 33 91–93.
  • [22] Haikady, N.N. (2006). Characterizations of Probability Distributions, part a ed. Springer Handbook of Engineering Statistics. London: Springer.
  • [23] Hald, A. (1998). A History of Mathematical Statistics from 1750 to 1930. Wiley Series in Probability and Statistics: Texts and References Section. New York: Wiley.
  • [24] Hürlimann, W. (1998). On the characterization of maximum likelihood estimators for location-scale families. Comm. Statist. Theory Methods 27 495–508.
  • [25] Jaynes, E.T. (1957). Information theory and statistical mechanics. Phys. Rev. (2) 106 620–630.
  • [26] Jones, M.C. and Pewsey, A. (2009). Sinh–arcsinh distributions. Biometrika 96 761–780.
  • [27] Kagan, A.M., Linnik, Yu.V. and Rao, C.R. (1973). Characterization Problems in Mathematical Statistics. New York: Wiley.
  • [28] Kendall, M.G. (1961). Studies in the history of probability and statistics. XI. Daniel Bernoulli on maximum likelihood. Biometrika 48 1–2.
  • [29] Kotz, S. (1974). Characterizations of statistical distributions: A supplement to recent surveys. Int. Statist. Rev. 42 39–65.
  • [30] Lehmann, E.L. and Casella, G. (1998). Theory of Point Estimation, 2nd ed. Springer Texts in Statistics. New York: Springer.
  • [31] Ley, C. and Paindaveine, D. (2010). Multivariate skewing mechanisms: A unified perspective based on the transformation approach. Statist. Probab. Lett. 80 1685–1694.
  • [32] Ley, C. and Paindaveine, D. (2010). On the singularity of multivariate skew-symmetric models. J. Multivariate Anal. 101 1434–1444.
  • [33] Lukacs, E. (1956). Characterization of populations by properties of suitable statistics. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 19541955, Vol. II 195–214. Berkeley, CA: Univ. California Press.
  • [34] Marshall, A.W. and Olkin, I. (1993). Maximum likelihood characterizations of distributions. Statist. Sinica 3 157–171.
  • [35] Norden, R.H. (1972). A survey of maximum likelihood estimation. Int. Statist. Rev. 40 329–254.
  • [36] Park, S.Y. and Bera, A.K. (2009). Maximum entropy autoregressive conditional heteroskedasticity model. J. Econometrics 150 219–230.
  • [37] Patil, G.P. and Seshadri, V. (1964). Characterization theorems for some univariate probability distributions. J. R. Stat. Soc. Ser. B Stat. Methodol. 26 286–292.
  • [38] Poincaré, H. (1912). Calcul des Probabilités. Paris: Carré-Naud.
  • [39] Puig, P. (2003). Characterizing additively closed discrete models by a property of their maximum likelihood estimators, with an application to generalized Hermite distributions. J. Amer. Statist. Assoc. 98 687–692.
  • [40] Puig, P. (2008). A note on the harmonic law: A two-parameter family of distributions for ratios. Statist. Probab. Lett. 78 320–326.
  • [41] Puig, P. and Valero, J. (2006). Count data distributions: Some characterizations with applications. J. Amer. Statist. Assoc. 101 332–340.
  • [42] Ross, N. (2011). Fundamentals of Stein’s method. Probab. Surv. 8 210–293.
  • [43] Stein, C. (1972). A bound for the error in the normal approximation to the distribution of a sum of dependent random variables. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. II: Probability Theory 583–602. Berkeley, CA: Univ. California Press.
  • [44] Stigler, S.M. (1999). Statistics on the Table: The History of Statistical Concepts and Methods. Cambridge, MA: Harvard Univ. Press.
  • [45] Stigler, S.M. (2007). The epic story of maximum likelihood. Statist. Sci. 22 598–620.
  • [46] Teicher, H. (1961). Maximum likelihood characterization of distributions. Ann. Math. Statist. 32 1214–1222.
  • [47] von Mises, R. (1918). Über die Ganzzahligkeit der Atomgewichte und verwandte Fragen. Physikalische Zeitschrift 19 490–500.
  • [48] Wu, X. (2003). Calculation of maximum entropy densities with application to income distribution. J. Econometrics 115 347–354.