Annals of Statistics

A class of Rényi information estimators for multidimensional densities

Nikolai Leonenko, Luc Pronzato, and Vippal Savani

Full-text: Open access


A class of estimators of the Rényi and Tsallis entropies of an unknown distribution f in ℝm is presented. These estimators are based on the kth nearest-neighbor distances computed from a sample of N i.i.d. vectors with distribution f. We show that entropies of any order q, including Shannon’s entropy, can be estimated consistently with minimal assumptions on f. Moreover, we show that it is straightforward to extend the nearest-neighbor method to estimate the statistical distance between two distributions using one i.i.d. sample from each.

Article information

Ann. Statist., Volume 36, Number 5 (2008), 2153-2182.

First available in Project Euclid: 13 October 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 94A15: Information theory, general [See also 62B10, 81P94] 62G20: Asymptotic properties

Entropy estimation estimation of statistical distance estimation of divergence nearest-neighbor distances Rényi entropy Havrda–Charvát entropy Tsallis entropy


Leonenko, Nikolai; Pronzato, Luc; Savani, Vippal. A class of Rényi information estimators for multidimensional densities. Ann. Statist. 36 (2008), no. 5, 2153--2182. doi:10.1214/07-AOS539.

Export citation


  • [1] Alemany, P. and Zanette, S. (1992). Fractal random walks from a variational formalism for Tsallis entropies. Phys. Rev. E 49 956–958.
  • [2] Basseville, M. (1996). Information entropies, divergences et moyennes. Research Report IRISA nb. 1020.
  • [3] Beirlant, J., Dudewicz, E. J., Györfi, L. and van der Meulen, E. C. (1997). Nonparametric entropy estimation: An overview. Internat. J. Math. Statist. Sci. 6 17–39.
  • [4] Bickel, P. J. and Breiman, L. (1983). Sums of functions of nearest neighbor distances, moment bounds, limit theorems and a goodness of fit test. Ann. Probab. 11 185–214.
  • [5] Bierens, H. J. (1994). Topics in Advanced Econometrics. Cambridge Univ. Press.
  • [6] Broniatowski, M. (2003). Estimation of the Kullback–Leibler divergence. Math. Methods Statist. 12 391–409.
  • [7] Devroye, L. P. and Wagner, T. J. (1977). The strong uniform consistency of nearest neighbor density estimates. Ann. Statist. 5 536–540.
  • [8] Evans, D., Jones, A. J. and Schmidt, W. M. (2002). Asymptotic moments of near-neighbour distance distributions. R. Soc. Lond. Proc. Ser. A Math. Phys. Eng. Sci. 458 2839–2849.
  • [9] Feller, W. (1966). An Introduction to Probability Theory and its Applications. II. Wiley, New York.
  • [10] Frank, T. and Daffertshofer, A. (2000). Exact time-dependent solutions of the Rényi Fokker–Planck equation and the Fokker–Planck equations related to the entropies proposed by Sharma and Mittal. Phys. A 285 351–366.
  • [11] Goria, M. N., Leonenko, N. N., Mergel, V. V. and Novi Inverardi, P. L. (2005). A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparametr. Statist. 17 277–297.
  • [12] Hall, P. (1986). On powerful distributional tests based on sample spacings. J. Multivariate Statist. 19 201–225.
  • [13] Hall, P. and Morton, S. (1993). On the estimation of entropy. Ann. Inst. Statist. Math. 45 69–88.
  • [14] Hall, P., Park, B. and Samworth, R. (2004). Choice of neighbour order in nearest-neighbour classification. Manuscript.
  • [15] Havrda, J. and Charvát, F. (1967). Quantification method of classification processes. Concept of structural α-entropy. Kybernetika (Prague) 3 30–35.
  • [16] Hero, III, A. O., Ma, B., Michel, O. and Gorman, J. (2002). Applications of entropic spanning graphs. IEEE Signal. Proc. Magazine 19 85–95. (Special Issue on Mathematics in Imaging.)
  • [17] Hero, III, A. O. and Michel, O. J. J. (1999). Asymptotic theory of greedy approximations to minimal k-point random graphs. IEEE Trans. Inform. Theory 45 1921–1938.
  • [18] Heyde, C. C. and Leonenko, N. N. (2005). Student processes. Adv. in Appl. Probab. 37 342–365.
  • [19] Jiménez, R. and Yukich, J. E. (2002). Asymptotics for statistical distances based on Voronoi tessellations. J. Theoret. Probab. 15 503–541.
  • [20] Kapur, J. N. (1989). Maximum-Entropy Models in Science and Engineering. Wiley, New York.
  • [21] Kozachenko, L. and Leonenko, N. (1987). On statistical estimation of entropy of a random vector. Problems Inform. Transmission 23 95–101. [Translated from Prolemy Predachi Informatsii 23 (1987) 9–16 (in Russian).]
  • [22] Kraskov, A., Stögbauer, H. and Grassberger, P. (2004). Estimating mutual information. Phys. Rev. E 69 1–16.
  • [23] Learned-Miller, E. and Fisher, J. (2003). ICA using spacings estimates of entropy. J. Machine Learning Research 4 1271–1295.
  • [24] Liero, H. (1993). A note on the asymptotic behaviour of the distance of the kth nearest neighbour. Statistics 24 235–243.
  • [25] Loève, M. (1977). Probability Theory I, 4th ed. Springer, New York.
  • [26] Loftsgaarden, D. O. and Quesenberry, C. P. (1965). A nonparametric estimate of a multivariate density function. Ann. Math. Statist. 36 1049–1051.
  • [27] Miller, E. (2003). A new class of entropy estimators for multidimensional densities. In Proc. ICASSP’2003.
  • [28] Moore, D. S. and Yackel, J. W. (1977). Consistency properties of nearest neighbor density function estimators. Ann. Statist. 5 143–154.
  • [29] Neemuchwala, H., Hero, A. and Carson, P. (2005). Image matching using alpha-entropy measures and entropic graphs. Signal Processing 85 277–296.
  • [30] Penrose, M. D. (2000). Central limit theorems for k-nearest neighbour distances. Stochastic Process. Appl. 85 295–320.
  • [31] Pronzato, L., Thierry, É. and Wolsztynski, É. (2004). Minimum entropy estimation in semi-parametric models: A candidate for adaptive estimation? In mODa 7—Advances in Model-Oriented Design and Analysis. Contrib. Statist. 125–132. Physica, Heidelberg.
  • [32] Redmond, C. and Yukich, J. E. (1996). Asymptotics for Euclidean functionals with power-weighted edges. Stochastic Process. Appl. 61 289–304.
  • [33] Rényi, A. (1961). On measures of entropy and information. Proc. 4th Berkeley Sympos. Math. Statist. Probab. I 547–561. Univ. California Press, Berkeley.
  • [34] Scott, D. W. (1992). Multivariate Density Estimation. Wiley, New York.
  • [35] Song, K. (2000). Limit theorems for nonparametric sample entropy estimators. Statist. Probab. Lett. 49 9–18.
  • [36] Song, K. (2001). Rényi information, loglikelihood and an intrinsic distribution measure. J. Statist. Plann. Inference 93 51–69.
  • [37] Tsallis, C. (1988). Possible generalization of Boltzmann–Gibbs statistics. J. Statist. Phys. 52 479–487.
  • [38] Tsallis, C. and Bukman, D. (1996). Anomalous diffusion in the presence of external forces: Exact time-dependent solutions and their thermostatistical basis. Phys. Rev. E 54 2197–2200.
  • [39] Tsybakov, A. B. and van der Meulen, E. C. (1996). Root-n consistent estimators of entropy for densities with unbounded support. Scand. J. Statist. 23 75–83.
  • [40] van Es, B. (1992). Estimating functionals related to a density by a class of statistics based on spacings. Scand. J. Statist. 19 61–72.
  • [41] Vasicek, O. (1976). A test for normality based on sample entropy. J. Roy. Statist. Soc. Ser. B 38 54–59.
  • [42] Victor, J. (2002). Binless strategies for estimation of information from neural data. Phys. Rev. E 66 1–15.
  • [43] Vignat, C., Hero, III, A. O. and Costa, J. A. (2004). About closedness by convolution of the Tsallis maximizers. Phys. A 340 147–152.
  • [44] Viola, P. and Wells, W. (1995). Alignment by maximization of mutual information. In 5th IEEE Internat. Conference on Computer Vision 16–23. Cambridge, MA.
  • [45] Wolsztynski, É., Thierry, É. and Pronzato, L. (2005). Minimum entropy estimation in semi-parametric models. Signal Processing 85 937–949.
  • [46] Zografos, K. (1999). On maximum entropy characterization of Pearson’s type II and VII multivariate distributions. J. Multivariate Anal. 71 67–75.