Electronic Journal of Statistics

A Bernstein-Von Mises Theorem for discrete probability distributions

S. Boucheron and E. Gassiat

Full-text: Open access


We investigate the asymptotic normality of the posterior distribution in the discrete setting, when model dimension increases with sample size. We consider a probability mass function θ0 on ℕ{0} and a sequence of truncation levels (kn)n satisfying kn3ninf iknθ0(i). Let θ̂ denote the maximum likelihood estimate of (θ0(i))ikn and let Δn(θ0) denote the kn-dimensional vector which i-th coordinate is defined by $\sqrt{n}(\hat{\theta}_{n}(i)-\theta_{0}(i))$ for 1ikn. We check that under mild conditions on θ0 and on the sequence of prior probabilities on the kn-dimensional simplices, after centering and rescaling, the variation distance between the posterior distribution recentered around θ̂n and rescaled by $\sqrt{n}$ and the kn-dimensional Gaussian distribution $\mathcal{N}(\Delta_{n}(\theta_{0}),I^{-1}(\theta_{0}))$ converges in probability to 0. This theorem can be used to prove the asymptotic normality of Bayesian estimators of Shannon and Rényi entropies.

The proofs are based on concentration inequalities for centered and non-centered Chi-square (Pearson) statistics. The latter allow to establish posterior concentration rates with respect to Fisher distance rather than with respect to the Hellinger distance as it is commonplace in non-parametric Bayesian statistics.

Article information

Electron. J. Statist., Volume 3 (2009), 114-148.

First available in Project Euclid: 28 January 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43] 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]
Secondary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Bernstein-Von Mises Theorem Entropy estimation non-parametric Bayesian statistics Discrete models Concentration inequalities


Boucheron, S.; Gassiat, E. A Bernstein-Von Mises Theorem for discrete probability distributions. Electron. J. Statist. 3 (2009), 114--148. doi:10.1214/08-EJS262. https://projecteuclid.org/euclid.ejs/1233176792

Export citation


  • A. Antos and I. Kontoyiannis. Convergence properties of functional estimates for discrete distributions., Random Struct. & Algorithms, 19(3-4):163–193, 2001.
  • S. Boucheron, A. Garivier, and E. Gassiat. Coding over infinite alphabets., IEEE Trans. Inform. Theory, 55:to appear, 2009.
  • B. Clarke and A. Barron. Information-theoretic asymptotics of bayes methods., IEEE Trans. Inform. Theory, 36:453–471, 1990.
  • B. Clarke and A. Barron. Jeffrey’s prior is asymptotically least favorable under entropy risk., J. Stat. Planning and Inference, 41:37–60, 1994.
  • T. Cover and J. Thomas., Elements of information theory. John Wiley & sons, 1991.
  • I. Csiszár and J. Körner., Information Theory: Coding Theorems for Discrete Memoryless Channels. Academic Press, 1981.
  • J. Doob. Application of the theory of martingales. In, Le Calcul des Probabilités et ses Applications., Colloques Internationaux du Centre National de la Recherche Scientifique, no. 13, pages 23–27. Centre National de la Recherche Scientifique, Paris, 1949.
  • D. Dubhashi and D. Ranjan. Balls and bins: A study in negative dependence., Random Struct. & Algorithms, 13(2):99–124, 1998.
  • R. M. Dudley., Real analysis and probability, volume 74 of Cambridge Studies in Advanced Mathematics. Cambridge University Press, Cambridge, 2002.
  • J. Fan. Local linear regression smoothers and their minimax efficiency., Annals of Statistics, 21:196–216, 1993.
  • J. Fan and Y. K. Truong. Nonparametric regression with errors in variables., Annals of Statistics, 21(4) :1900–1925, 1993.
  • J. Fan, C. Zhang, and J. Zhang. Generalized likelihood ratio statistics and wilks phenomenon., Annals of Statistics, 29(1):153–193, 2001.
  • D. A. Freedman. On the asymptotic behavior of Bayes’ estimates in the discrete case., Ann. Math. Statist., 34 :1386–1403, 1963.
  • D. A. Freedman. On the asymptotic behavior of Bayes estimates in the discrete case. II., Ann. Math. Statist., 36:454–456, 1965.
  • R. G. Gallager., Information theory and reliable communication. John Wiley & sons, 1968.
  • S. Ghosal. Asymptotic normality of posterior distributions for exponential families when the number of parameters tends to infinity., J. Multivariate Anal., 74(1):49–68, 2000.
  • S. Ghosal and A. van der Vaart. Convergence rates of posterior distributions for non-i.i.d. observations., Annals of Statistics, 35(1):192–223, 2007a.
  • S. Ghosal and A. van der Vaart. Posterior convergence rates of Dirichlet mixtures at smooth densities., Annals of Statistics, 35(2):697–723, 2007b.
  • S. Ghosal and A. W. van der Vaart. Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities., Annals of Statistics, 29(5) :1233–1263, 2001.
  • S. Ghosal, J. Ghosh, and A. van der Vaart. Convergence rates of posterior distributions., Annals of Statistics, 28(2):500–531, 2000.
  • L. Gyorfi, I. Pali, and E. van der Meulen. On universal noiseless source coding for infinite source alphabets., Eur. Trans. Telecommun. & Relat. Technol., 4(2):125–132, 1993.
  • J. C. Kieffer. A unified approach to weak universal source coding., IEEE Trans. Inform. Theory, 24(6):674–682, 1978.
  • L. Le Cam and G. Yang., Asymptotics in Statistics: Some Basic Concepts. Springer, 2000.
  • A. Lo. A large sample study of the Bayesian bootstrap., Ann. Statist., 15(1):360–375, 1987.
  • A. Lo. A Bayesian bootstrap for a finite population., Ann. Statist., 16(4) :1684–1695, 1988.
  • P. Massart., Ecole d’Eté de Probabilité de Saint-Flour XXXIII, chapter Concentration inequalities and model selection. LNM. Springer-Verlag, 2003.
  • L. Paninski. Estimating entropy on m bins given fewer than m samples., IEEE Trans. Inform. Theory, 50(9) :2200–2203, 2004.
  • S. Portnoy. Asymptotic behavior of likelihood methods for exponential families when the number of parameters tends to infinity., Annals of Statistics, 16:356–366, 1988.
  • D. Rubin. The Bayesian bootstrap., Annals of Statistics, 9(1):130, 1981.
  • L. Schwartz. On Bayes procedures., Z. Wahrscheinlichkeitstheorie und Verw. Gebiete, 4:10–26, 1965.
  • A. Tsybakov., Introduction à l’estimation non-paramétrique, volume 41 of Mathématiques & Applications. Springer-Verlag, Berlin, 2004.
  • A. van der Vaart. The statistical work of Lucien Le Cam., Annals of Statistics, 30(3):631–682, 2002.
  • A. van der Vaart., Asymptotic statistics. Cambridge University Press, 1998.
  • A. van der Vaart and J. Wellner., Weak convergence and empirical processes. Springer Series in Statistics. Springer-Verlag, New York, 1996.
  • C.-S. Weng. On a second-order asymptotic property of the Bayesian bootstrap mean., Ann. Statist., 17(2):705–710, 1989.