The Annals of Statistics

Maximum Lq-likelihood estimation

Davide Ferrari and Yuhong Yang

Full-text: Open access


In this paper, the maximum Lq-likelihood estimator (MLqE), a new parameter estimator based on nonextensive entropy [Kibernetika 3 (1967) 30–35] is introduced. The properties of the MLqE are studied via asymptotic analysis and computer simulations. The behavior of the MLqE is characterized by the degree of distortion q applied to the assumed model. When q is properly chosen for small and moderate sample sizes, the MLqE can successfully trade bias for precision, resulting in a substantial reduction of the mean squared error. When the sample size is large and q tends to 1, a necessary and sufficient condition to ensure a proper asymptotic normality and efficiency of MLqE is established.

Article information

Ann. Statist., Volume 38, Number 2 (2010), 753-783.

First available in Project Euclid: 19 February 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62F99: None of the above, but in this section
Secondary: 60F05: Central limit and other weak theorems 94A17: Measures of information, entropy 62G32: Statistics of extreme values; tail inference

Maximum Lq-likelihood estimation nonextensive entropy asymptotic efficiency exponential family tail probability estimation


Ferrari, Davide; Yang, Yuhong. Maximum L q -likelihood estimation. Ann. Statist. 38 (2010), no. 2, 753--783. doi:10.1214/09-AOS687.

Export citation


  • [1] Abe, S. (2003). Geometry of escort distributions. Phys. Rev. E 68 031101.
  • [2] Aczél, J. D. and Daróczy, Z. (1975). On measures of information and their characterizations. Math. Sci. Eng. 115. Academic Press, New York–London.
  • [3] Akaike, H. (1973). Information theory and an extension of the likelihood principle. In 2nd International Symposium of Information Theory 267–281. Akad. Kiadó, Budapest.
  • [4] Altun, Y. and Smola, A. (2006). Unifying divergence minimization and statistical inference via convex duality. In Learning Theory. Lecture Notes in Computer Science 4005 139–153. Springer, Berlin.
  • [5] Barron, A., Rissanen, J. and Yu, B. (1998). The minimum description length principle in coding and modeling. IEEE Trans. Inform. Theory 44 2743–2760.
  • [6] Basu, A. Harris, I. R., Hjort, N. L. and Jones, M. C. (1998). Robust and efficient estimation by minimising a density power divergence. Biometrika 85 549–559.
  • [7] Beck, C. and Schlögl, F. (1993). Thermodynamics of Chaotic Systems: An Introduction. Cambridge Univ. Press, Cambridge.
  • [8] Choi, E., Hall, P. and Presnell, B. (2000). Rendering parametric procedures more robust by empirically tilting the model. Biometrika 87 453–465.
  • [9] Cover, T. M. and Thomas, J. A. (2006). Elements of Information Theory. Wiley, New York.
  • [10] Daniels, H. E. (1997). Saddlepoint approximations in statistics (Pkg: P171-200). In Breakthroughs in Statistics (S. Kotz and N. L. Johnson, eds.) 3 177–200. Springer, New York.
  • [11] Ferguson, T. S. (1996). A Course in Large Sample Theory. Chapman & Hall, London.
  • [12] Ferrari, D. and Paterlini, S. (2007). The maximum Lq-likelihood method: An application to extreme quantile estimation in finance. Methodol. Comput. Appl. Probab. 11 3–19.
  • [13] Field, C. and Ronchetti, E. (1990). Small Sample Asymptotics. IMS, Hayward, CA.
  • [14] Gell-Mann, M., ed. (2004). Nonextensive Entropy, Interdisciplinary Applications. Oxford Univ. Press, New York.
  • [15] Goldfarb, D. (1970). A family of variable-metric method derived by variational means. Math. Comp. 24 23–26.
  • [16] Havrda, J. and Charvát, F. (1967). Quantification method of classification processes: Concept of structural entropy. Kibernetika 3 30–35.
  • [17] Huang, J. Z., Liu, N., Pourahmadi, M. and Liu, L. (2006). Covariance matrix selection and estimation via penalised normal likelihood. Biometrika 93 85–98.
  • [18] Huber, P. J. (1981). Robust Statistics. Wiley, New York.
  • [19] Jaynes, E. T. (1957). Information theory and statistical mechanics. Phys. Rev. 106 620.
  • [20] Jaynes, E. T. (1957). Information theory and statistical mechanics II. Phys. Rev. 108 171.
  • [21] Kullback, S. (1959). Information Theory and Statistics. Wiley, New York.
  • [22] Kullback, S. and Leibler, R. A. (1951). On information and sufficiency. Ann. Math. Statistics 22 79–86.
  • [23] Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation. Springer, New York.
  • [24] Magnus, J. R. and Neudecker, H. (1979). The commutation matrix: Some properties and applications. Ann. Statist. 7 381–394.
  • [25] McCulloch, C. E. (1982). Symmetric matrix derivatives with applications. J. Amer. Statist. Assoc. 77 679–682.
  • [26] Naudts, J. (2004). Estimators, escort probabilities, and phi-exponential families in statistical physics. J. Inequal. Pure Appl. Math. 5 102.
  • [27] Rényi, A. (1961). On measures of entropy and information. In Proc. 4th Berkeley Sympos. Math. Statist. and Prob. 1 547–461. Univ. California Press, Berkeley.
  • [28] Shannon, C. E. (1948). A mathematical theory of communication. Bell System Tech. J. 27 379–423.
  • [29] Tsallis, C. (1988). Possible generalization of Boltzmann–Gibbs statistics. J. Statist. Phys. 52 479–487.
  • [30] Tsallis, C., Mendes, R. S. and Plastino, A. R. (1998). The role of constraints within generalized nonextensive statistics. Physica A: Statistical and Theoretical Physics 261 534–554.
  • [31] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Univ. Press, Cambridge.
  • [32] Wang, X., van Eeden, C. and Zidek, J. V. (2004). Asymptotic properties of maximum weighted likelihood estimators. J. Statist. Plann. Inference 119 37–54.
  • [33] Windham, M. P. (1995). Robustifying model fitting. J. Roy. Statist. Soc. Ser. B 57 599–609.