## The Annals of Statistics

### From ɛ-entropy to KL-entropy: Analysis of minimum information complexity density estimation

Tong Zhang

#### Abstract

We consider an extension of ɛ-entropy to a KL-divergence based complexity measure for randomized density estimation methods. Based on this extension, we develop a general information-theoretical inequality that measures the statistical complexity of some deterministic and randomized density estimators. Consequences of the new inequality will be presented. In particular, we show that this technique can lead to improvements of some classical results concerning the convergence of minimum description length and Bayesian posterior distributions. Moreover, we are able to derive clean finite-sample convergence bounds that are not obtainable using previous approaches.

#### Article information

Source
Ann. Statist., Volume 34, Number 5 (2006), 2180-2210.

Dates
First available in Project Euclid: 23 January 2007

https://projecteuclid.org/euclid.aos/1169571794

Digital Object Identifier
doi:10.1214/009053606000000704

Mathematical Reviews number (MathSciNet)
MR2291497

Zentralblatt MATH identifier
1106.62005

#### Citation

Zhang, Tong. From ɛ -entropy to KL-entropy: Analysis of minimum information complexity density estimation. Ann. Statist. 34 (2006), no. 5, 2180--2210. doi:10.1214/009053606000000704. https://projecteuclid.org/euclid.aos/1169571794

#### References

• Barron, A. and Cover, T. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034--1054.
• Barron, A., Schervish, M. J. and Wasserman, L. (1999). The consistency of posterior distributions in nonparametric problems. Ann. Statist. 27 536--561.
• Catoni, O. (2004). A PAC-Bayesian approach to adaptive classification. Available at www.proba.jussieu.fr/users/catoni/homepage/classif.pdf.
• Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500--531.
• Le Cam, L. (1973). Convergence of estimates under dimensionality restrictions. Ann. Statist. 1 38--53.
• Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, New York.
• Li, J. (1999). Estimation of mixture models. Ph.D. dissertation, Dept. Statistics, Yale Univ.
• Meir, R. and Zhang, T. (2003). Generalization error bounds for Bayesian mixture algorithms. J. Mach. Learn. Res. 4 839--860.
• Rényi, A. (1961). On measures of entropy and information. Proc. Fourth Berkeley Symp. Math. Statist. Probab. 1 547--561. Univ. California Press, Berkeley.
• Rissanen, J. (1989). Stochastic Complexity in Statistical Inquiry. World Scientific, Singapore.
• Seeger, M. (2002). PAC-Bayesian generalization error bounds for Gaussian process classification. J. Mach. Learn. Res. 3 233--269.
• Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687--714.
• van de Geer, S. (2000). Empirical Processes in $M$-Estimation. Cambridge Univ. Press.
• van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. With Applications to Statistics. Springer, New York.
• Walker, S. and Hjort, N. (2001). On Bayesian consistency. J. R. Stat. Soc. Ser. B Stat. Methodol. 63 811--821.
• Yang, Y. and Barron, A. (1999). Information-theoretic determination of minimax rates of convergence. Ann. Statist. 27 1564--1599.
• Zhang, T. (1999). Theoretical analysis of a class of randomized regularization methods. In Proc. Twelfth Annual Conference on Computational Learning Theory 156--163. ACM Press, New York.
• Zhang, T. (2004). Learning bounds for a generalized family of Bayesian posterior distributions. In Advances in Neural Information Processing Systems 16 (S. Thrun, L. K. Saul and B. Schölkopf, eds.) 1149--1156. MIT Press, Cambridge, MA.