The Annals of Statistics

The semiparametric Bernstein–von Mises theorem

P. J. Bickel and B. J. K. Kleijn

Full-text: Open access


In a smooth semiparametric estimation problem, the marginal posterior for the parameter of interest is expected to be asymptotically normal and satisfy frequentist criteria of optimality if the model is endowed with a suitable prior. It is shown that, under certain straightforward and interpretable conditions, the assertion of Le Cam’s acclaimed, but strictly parametric, Bernstein–von Mises theorem [Univ. California Publ. Statist. 1 (1953) 277–329] holds in the semiparametric situation as well. As a consequence, Bayesian point-estimators achieve efficiency, for example, in the sense of Hájek’s convolution theorem [Z. Wahrsch. Verw. Gebiete 14 (1970) 323–330]. The model is required to satisfy differentiability and metric entropy conditions, while the nuisance prior must assign nonzero mass to certain Kullback–Leibler neighborhoods [Ghosal, Ghosh and van der Vaart Ann. Statist. 28 (2000) 500–531]. In addition, the marginal posterior is required to converge at parametric rate, which appears to be the most stringent condition in examples. The results are applied to estimation of the linear coefficient in partial linear regression, with a Gaussian prior on a smoothness class for the nuisance.

Article information

Ann. Statist. Volume 40, Number 1 (2012), 206-237.

First available in Project Euclid: 29 March 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G86: Nonparametric inference and fuzziness
Secondary: 62G20: Asymptotic properties 62F15: Bayesian inference

Asymptotic posterior normality posterior limit distribution model differentiability local asymptotic normality semiparametric statistics regular estimation efficiency Bernstein–Von Mises


Bickel, P. J.; Kleijn, B. J. K. The semiparametric Bernstein–von Mises theorem. Ann. Statist. 40 (2012), no. 1, 206--237. doi:10.1214/11-AOS921.

Export citation


  • [1] Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1998). Efficient and Adaptive Estimation for Semiparametric Models, 2nd ed. Springer, New York.
  • [2] Birgé, L. (1983). Approximation dans les espaces métriques et théorie de l’estimation. Z. Wahrsch. Verw. Gebiete 65 181–237.
  • [3] Birgé, L. (1984). Sur un théorème de minimax et son application aux tests. Probab. Math. Statist. 3 259–282.
  • [4] Boucheron, S. and Gassiat, E. (2009). A Bernstein–von Mises theorem for discrete probability distributions. Electron. J. Stat. 3 114–148.
  • [5] Castillo, I. (2011). Semiparametric Bernstein–von Mises theorem and bias, illustrated with Gaussian process priors. Preprint, CNRS.
  • [6] Castillo, I. (2012). A semiparametric Bernstein–von Mises theorem for Gaussian process priors. Probab. Theory Related Fields 152 53–99.
  • [7] Chen, H. and Shiau, J. J. H. (1991). A two-stage spline smoothing method for partially linear models. J. Statist. Plann. Inference 27 187–201.
  • [8] Cheng, G. and Kosorok, M. R. (2008). General frequentist properties of the posterior profile distribution. Ann. Statist. 36 1819–1853.
  • [9] Cox, D. D. (1993). An analysis of Bayesian inference for nonparametric regression. Ann. Statist. 21 903–923.
  • [10] Cramér, H. (1946). Mathematical Methods of Statistics. Princeton Mathematical Series 9. Princeton Univ. Press, Princeton, NJ.
  • [11] Diaconis, P. and Freedman, D. (1986). On the consistency of Bayes estimates. Ann. Statist. 14 1–26.
  • [12] Diaconis, P. W. and Freedman, D. (1998). Consistency of Bayes estimates for nonparametric regression: Normal theory. Bernoulli 4 411–444.
  • [13] Fisher, R. A. (1959). Statistical Methods and Scientific Inference, 2nd ed. Oliver and Boyd, London.
  • [14] Freedman, D. (1999). On the Bernstein–von Mises theorem with infinite-dimensional parameters. Ann. Statist. 27 1119–1140.
  • [15] Freedman, D. A. (1963). On the asymptotic behavior of Bayes’ estimates in the discrete case. Ann. Math. Statist. 34 1386–1403.
  • [16] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • [17] Hájek, J. (1970). A characterization of limiting distributions of regular estimates. Z. Wahrsch. Verw. Gebiete 14 323–330.
  • [18] Hájek, J. (1972). Local asymptotic minimax and admissibility in estimation. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. I: Theory of Statistics 175–194. Univ. California Press, Berkeley, CA.
  • [19] Ibragimov, I. A. and Has’minskiĭ, R. Z. (1981). Statistical Estimation: Asymptotic Theory. Applications of Mathematics 16. Springer, New York.
  • [20] Johnstone, I. (2010). High dimensional Bernstein–von Mises: Simple examples. In Borrowing Strength: Theory Powering Applications—A Festschrift for Lawrence D. Brown (J. Berger, T. Cai and I. Johnstone, eds.) 87–98. IMS, Beachwood, OH.
  • [21] Kim, Y. (2006). The Bernstein–von Mises theorem for the proportional hazard model. Ann. Statist. 34 1678–1700.
  • [22] Kim, Y. and Lee, J. (2004). A Bernstein–von Mises theorem in the nonparametric right-censoring model. Ann. Statist. 32 1492–1512.
  • [23] Kimeldorf, G. S. and Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist. 41 495–502.
  • [24] Kleijn, B. (2003). Bayesian asymptotics under misspecification. Ph.D. thesis, Free Univ. Amsterdam.
  • [25] Kleijn, B. and Knapik, B. (2012). Semiparametric posterior limits under local asymptotic exponentiality. Preprint, Korteweg-de Vries Institute, Amsterdam.
  • [26] Kleijn, B. and van der Vaart, A. (2008). The Bernstein–von Mises theorem under misspecification. Preprint.
  • [27] Kleijn, B. J. K. and van der Vaart, A. W. (2006). Misspecification in infinite-dimensional Bayesian statistics. Ann. Statist. 34 837–877.
  • [28] Le Cam, L. (1953). On some asymptotic properties of maximum likelihood estimates and related Bayes’ estimates. Univ. California Publ. Statist. 1 277–329.
  • [29] Le Cam, L. (1972). Limits of experiments. In Proceedings of the Sixth Berkeley Symposium on Mathematical Statistics and Probability (Univ. California, Berkeley, Calif., 1970/1971), Vol. I: Theory of Statistics 245–261. Univ. California Press, Berkeley, CA.
  • [30] Le Cam, L. (1986). Asymptotic Methods in Statistical Decision Theory. Springer, New York.
  • [31] Le Cam, L. and Yang, G. L. (1990). Asymptotics in Statistics: Some Basic Concepts. Springer, New York.
  • [32] Lehmann, E. L. and Casella, G. (1998). Theory of Point Estimation, 2nd ed. Springer, New York.
  • [33] Mammen, E. and van de Geer, S. (1997). Penalized quasi-likelihood estimation in partial linear models. Ann. Statist. 25 1014–1035.
  • [34] Murphy, S. A. and van der Vaart, A. W. (2000). On profile likelihood. J. Amer. Statist. Assoc. 95 449–485.
  • [35] Rivoirard, V. and Rousseau, J. (2009). Bernstein–von Mises theorem for linear functionals of the density. Preprint. Available at arXiv:0908.4167v1.
  • [36] Robert, C. P. (2001). The Bayesian Choice: From Decision-Theoretic Foundations to Computational Implementation, 2nd ed. Springer, New York.
  • [37] Schwartz, L. (1965). On Bayes procedures. Z. Wahrsch. Verw. Gebiete 4 10–26.
  • [38] Severini, T. A. and Wong, W. H. (1992). Profile likelihood and conditionally parametric models. Ann. Statist. 20 1768–1802.
  • [39] Shen, X. (2002). Asymptotic normality of semiparametric and nonparametric posterior distributions. J. Amer. Statist. Assoc. 97 222–235.
  • [40] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687–714.
  • [41] Shively, T. S., Kohn, R. and Wood, S. (1999). Variable selection and function estimation in additive nonparametric regression using a data-based prior. J. Amer. Statist. Assoc. 94 777–804.
  • [42] Stein, C. (1956). Efficient nonparametric testing and estimation. In Proceedings of the Third Berkeley Symposium on Mathematical Statistics and Probability, 19541955, Vol. I 187–195. Univ. California Press, Berkeley.
  • [43] van der Vaart, A. W. (1998). Asymptotic Statistics. Cambridge Series in Statistical and Probabilistic Mathematics 3. Cambridge Univ. Press, Cambridge.
  • [44] van der Vaart, A. W. and van Zanten, J. H. (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist. 36 1435–1463.
  • [45] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • [46] Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. Roy. Statist. Soc. Ser. B 40 364–372.