## The Annals of Statistics

### Efficient multivariate entropy estimation via $k$-nearest neighbour distances

#### Abstract

Many statistical procedures, including goodness-of-fit tests and methods for independent component analysis, rely critically on the estimation of the entropy of a distribution. In this paper, we seek entropy estimators that are efficient and achieve the local asymptotic minimax lower bound with respect to squared error loss. To this end, we study weighted averages of the estimators originally proposed by Kozachenko and Leonenko [Probl. Inform. Transm. 23 (1987), 95–101], based on the $k$-nearest neighbour distances of a sample of $n$ independent and identically distributed random vectors in $\mathbb{R}^{d}$. A careful choice of weights enables us to obtain an efficient estimator in arbitrary dimensions, given sufficient smoothness, while the original unweighted estimator is typically only efficient when $d\leq 3$. In addition to the new estimator proposed and theoretical understanding provided, our results facilitate the construction of asymptotically valid confidence intervals for the entropy of asymptotically minimal width.

#### Article information

Source
Ann. Statist., Volume 47, Number 1 (2019), 288-318.

Dates
Revised: November 2017
First available in Project Euclid: 30 November 2018

https://projecteuclid.org/euclid.aos/1543568589

Digital Object Identifier
doi:10.1214/18-AOS1688

Subjects
Primary: 62G05: Estimation 62G20: Asymptotic properties

#### Citation

Berrett, Thomas B.; Samworth, Richard J.; Yuan, Ming. Efficient multivariate entropy estimation via $k$-nearest neighbour distances. Ann. Statist. 47 (2019), no. 1, 288--318. doi:10.1214/18-AOS1688. https://projecteuclid.org/euclid.aos/1543568589

#### References

• Barbour, A. D. and Chen, L. H. Y., eds. (2005). An Introduction to Stein’s Method. Lecture Notes Series. Institute for Mathematical Sciences. National University of Singapore 4. Singapore Univ. Press, Singapore.
• Beirlant, J., Dudewicz, E. J., Györfi, L. and van der Meulen, E. C. (1997). Nonparametric entropy estimation: An overview. Int. J. Math. Stat. Sci. 6 17–39.
• Berrett, T. B., Samworth, R. J. and Yuan, M. (2019). Supplement to “Efficient multivariate entropy estimation via $k$-nearest neighbour distances.” DOI:10.1214/18-AOS1688SUPP.
• Biau, G. and Devroye, L. (2015). Lectures on the Nearest Neighbor Method. Springer, Cham.
• Cai, T. T. and Low, M. G. (2011). Testing composite hypotheses, Hermite polynomials and optimal estimation of a nonsmooth functional. Ann. Statist. 39 1012–1041.
• Cressie, N. (1976). On the logarithms of high-order spacings. Biometrika 63 343–355.
• Delattre, S. and Fournier, N. (2017). On the Kozachenko–Leonenko entropy estimator. J. Statist. Plann. Inference 185 69–93.
• El Haje Hussein, F. and Golubev, Yu. (2009). On entropy estimation by $m$-spacing method. Zap. Nauchn. Sem. S.-Peterburg. Otdel. Mat. Inst. Steklov. (POMI) 363 151–181, 189.
• Gao, W., Oh, S. and Viswanath, P. (2016). Demystifying fixed $k$-nearest neighbor information estimators. Available at arXiv:1604.03006.
• Goria, M. N., Leonenko, N. N., Mergel, V. V. and Novi Inverardi, P. L. (2005). A new class of random vector entropy estimators and its applications in testing statistical hypotheses. J. Nonparametr. Stat. 17 277–297.
• Hall, P. and Morton, S. C. (1993). On the estimation of entropy. Ann. Inst. Statist. Math. 45 69–88.
• Kantorovič, L. V. and Rubinšteĭn, G. Š. (1958). On a space of completely additive functions. Vestnik Leningrad Univ. Math. 13 52–59.
• Kellerer, H. G. (1985). Duality theorems and probability metrics. In Proceedings of the Seventh Conference on Probability Theory (Braşov, 1982) 211–220. VNU Sci. Press, Utrecht.
• Kozachenko, L. F. and Leonenko, N. N. (1987). Sample estimate of the entropy of a random vector. Probl. Inf. Transm. 23 95–101.
• Kwak, N. and Choi, C. (2002). Input feature selection by mutual information based on Parzen window. IEEE Trans. Pattern Anal. Mach. Intell. 24 1667–1671.
• Laurent, B. (1996). Efficient estimation of integral functionals of a density. Ann. Statist. 24 659–681.
• Learned-Miller, E. G. and Fisher, J. W. III (2004). ICA using spacings estimates of entropy. J. Mach. Learn. Res. 4 1271–1295.
• Lepski, O., Nemirovski, A. and Spokoiny, V. (1999). On estimation of the $L_{r}$ norm of a regression function. Probab. Theory Related Fields 113 221–253.
• Mnatsakanov, R. M., Misra, N., Li, Sh. and Harner, E. J. (2008). $k_{n}$-nearest neighbor estimators of entropy. Math. Methods Statist. 17 261–277.
• Moon, K. R., Sricharan, K., Greenewald, K. and Hero, A. O. (2016). Nonparametric ensemble estimation of distributional functionals. https://arxiv.org/abs/1601.06884v2.
• Paninski, L. (2003). Estimation of entropy and mutual information. Neural Comput. 15 1191–1253.
• Paninski, L. and Yajima, M. (2008). Undersmoothed kernel entropy estimators. IEEE Trans. Inform. Theory 54 4384–4388.
• Shorack, G. R. and Wellner, J. A. (2009). Empirical Processes with Applications to Statistics. Classics in Applied Mathematics 59. Society for Industrial and Applied Mathematics (SIAM), Philadelphia, PA.
• Singh, S. and Póczos, B. (2016). Analysis of $k$ nearest neighbor distances with application to entropy estimation. NIPS 29 1217–1225.
• Singh, H., Misra, N., Hnizdo, V., Fedorowicz, A. and Demchuk, E. (2003). Nearest neighbor estimates of entropy. Amer. J. Math. Management Sci. 23 301–321.
• Sricharan, K., Wei, D. and Hero, A. O. III (2013). Ensemble estimators for multivariate entropy estimation. IEEE Trans. Inform. Theory 59 4374–4388.
• Tsybakov, A. B. and van der Meulen, E. C. (1996). Root-$n$ consistent estimators of entropy for densities with unbounded support. Scand. J. Stat. 23 75–83.
• van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
• Vasicek, O. (1976). A test for normality based on sample entropy. J. Roy. Statist. Soc. Ser. B 38 54–59.

#### Supplemental materials

• Supplement to “Efficient multivariate entropy estimation via $k$-nearest neighbour distances”. Auxiliary results and remaining proofs.