The Annals of Statistics

Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth

A. W. van der Vaart and J. H. van Zanten

Full-text: Open access

Abstract

We consider nonparametric Bayesian estimation inference using a rescaled smooth Gaussian field as a prior for a multidimensional function. The rescaling is achieved using a Gamma variable and the procedure can be viewed as choosing an inverse Gamma bandwidth. The procedure is studied from a frequentist perspective in three statistical settings involving replicated observations (density estimation, regression and classification). We prove that the resulting posterior distribution shrinks to the distribution that generates the data at a speed which is minimax-optimal up to a logarithmic factor, whatever the regularity level of the data-generating distribution. Thus the hierachical Bayesian procedure, with a fixed prior, is shown to be fully adaptive.

Article information

Source
Ann. Statist., Volume 37, Number 5B (2009), 2655-2675.

Dates
First available in Project Euclid: 17 July 2009

Permanent link to this document
https://projecteuclid.org/euclid.aos/1247836664

Digital Object Identifier
doi:10.1214/08-AOS678

Mathematical Reviews number (MathSciNet)
MR2541442

Zentralblatt MATH identifier
1173.62021

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 62-07: Data analysis
Secondary: 65U05 68T05: Learning and adaptive systems [See also 68Q32, 91E40]

Keywords
Rate of convergence posterior distribution adaptation Bayesian inference nonparametric density estimation nonparametric regression classification Gaussian process priors

Citation

van der Vaart, A. W.; van Zanten, J. H. Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth. Ann. Statist. 37 (2009), no. 5B, 2655--2675. doi:10.1214/08-AOS678. https://projecteuclid.org/euclid.aos/1247836664


Export citation

References

  • [1] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
  • [2] Barron, A. R. and Cover, T. M. (1991). Minimum complexity density estimation. IEEE Trans. Inform. Theory 37 1034–1054.
  • [3] Bauer, H. (1981). Probability Theory and Elements of Measure Theory, 2nd ed. Academic Press, London.
  • [4] Belitser, E. and Levit, B. (2001). Asymptotically local minimax estimation of infinitely smooth density with censored data. Ann. Inst. Statist. Math. 53 289–306.
  • [5] Borell, C. (1975). The Brunn–Minkowski inequality in Gauss space. Invent. Math. 30 207–216.
  • [6] Cai, T. T. (1999). Adaptive wavelet estimation: A block thresholding and oracle inequality approach. Ann. Statist. 27 898–924.
  • [7] Chaudhuri, N., Ghosal, S. and Roy, A. (2007). Nonparametric binary regression using a Gaussian process prior. Stat. Methodol. 4 227–243.
  • [8] Choi, T. (2007). Alternative posterior consistency results in nonparametric binary regression using Gaussian process priors. J. Statist. Plann. Inference 137 2975–2983.
  • [9] Choi, T. and Schervish, M. J. (2007). Posterior consistency in nonparametric regression problem under Gaussian process prior. J. Multivariate Anal.
  • [10] Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1995). Wavelet shrinkage: Asymptopia? J. Roy. Statist. Soc. Ser. B 57 301–369.
  • [11] Donoho, D. L., Johnstone, I. M., Kerkyacharian, G. and Picard, D. (1996). Density estimation by wavelet thresholding. Ann. Statist. 24 508–539.
  • [12] Efromovich, S. Y. and Pinsker, M. S. (1984). Learning algorithm for nonparametric filtering. Autom. Remote Control 11 1434–1440.
  • [13] Efromovich, S. (1999). Nonparametric Curve Estimation. Springer, New York.
  • [14] Ghosal, S. and Roy, A. (2006). Posterior consistency in nonparametric regression problem under Gaussian process prior. Ann. Statist. 34 2413–2429.
  • [15] Ghosal, S. and van der Vaart, A. W. (2001). Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities. Ann. Statist. 29 1233–1263.
  • [16] Ghosal, S. and van der Vaart, A. W. (2007). Convergence rates for posterior distributions for non-i.i.d. observations. Ann. Statist. 35 697–723.
  • [17] Ghosal, S., Ghosh, J. K. and van der Vaart, A. W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • [18] Ghosal, S., Lember, J. and van der Vaart, A. (2008). Nonparametric Bayesian model selection and averaging. Electron. J. Stat. 2 63–89.
  • [19] Golubev, G. K. (1987). Adaptive asymptotically minimax estimates for smooth signals. Problemy Peredachi Informatsii 23 57–67.
  • [20] Golubev, G. K. and Levit, B. Y. (1996). Asymptotically efficient estimation for analytic distributions. Math. Methods Statist. 5 357–368.
  • [21] Huang, T.-M. (2004). Convergence rates for posterior distributions and adaptive estimation. Ann. Statist. 32 1556–1593.
  • [22] Ibragimov, I. A. and Has’minskiĭ, R. Z. (1980). An estimate of the density of a distribution. Zap. Nauchn. Sem. Leningrad. Otdel. Mat. Inst. Steklov. (LOMI) 98 61–85, 161–162.
  • [23] Ibragimov, I. A. and Khas’minskiĭ, R. Z. (1982). An estimate of the density of a distribution belonging to a class of entire functions. Teor. Veroyatnost. i Primenen. 27 514–524.
  • [24] Kimeldorf, G. S. and Wahba, G. (1970). A correspondence between Bayesian estimation on stochastic processes and smoothing by splines. Ann. Math. Statist. 41 495–502.
  • [25] Kolmogorov, A. N. and Tihomirov, V. M. (1961). ɛ-entropy and ɛ-capacity of sets in functional space. Amer. Math. Soc. Transl. (2) 17 277–364.
  • [26] Kuelbs, J. and Li, W. V. (1993). Metric entropy and the small ball problem for Gaussian measures. J. Funct. Anal. 116 133–157.
  • [27] Kuelbs, J., Li, W. V. and Linde, W. (1994). The Gaussian measure of shifted balls. Probab. Theory Related Fields 98 143–162.
  • [28] Lember, J. and van der Vaart, A. W. (2007). On universal Bayesian adaptation. Statist. Decisions 25 127–152.
  • [29] Lenk, P. J. (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities. J. Amer. Statist. Assoc. 83 509–516.
  • [30] Lenk, P. J. (1991). Towards a practicable Bayesian nonparametric density estimator. Biometrika 78 531–543.
  • [31] Leonard, T. (1978). Density estimation, stochastic processes and prior information. J. Roy. Statist. Soc. Ser. B 40 113–146.
  • [32] Lepski, O. V. and Levit, B. Y. (1998). Adaptive minimax estimation of infinitely differentiable functions. Math. Methods Statist. 7 123–156.
  • [33] Lepskiĭ, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatnost. i Primenen. 35 459–470.
  • [34] Lepskiĭ, O. V. (1991). Asymptotically minimax adaptive estimation. I. Upper bounds. Optimally adaptive estimates. Teor. Veroyatnost. i Primenen. 36 645–659.
  • [35] Lepskiĭ, O. V. (1992). Asymptotically minimax adaptive estimation. II. Schemes without optimal adaptation. Adaptive estimates. Teor. Veroyatnost. i Primenen. 37 468–481.
  • [36] Li, W. V. and Linde, W. (1999). Approximation, metric entropy and small ball estimates for Gaussian measures. Ann. Probab. 27 1556–1578.
  • [37] Nussbaum, M. (1985). Spline smoothing in regression models and asymptotic efficiency in L2. Ann. Statist. 13 984–997.
  • [38] Parthasarathy, K. R. (2005). Introduction to Probability and Measure. Texts and Readings in Mathematics 33. Hindustan Book Agency, New Delhi.
  • [39] Pisier, G. (1989). The Volume of Convex Bodies and Banach Space Geometry. Cambridge Tracts in Mathematics 94. Cambridge Univ. Press, Cambridge.
  • [40] Rasmussen, C. E. and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. MIT Press, Cambridge, MA.
  • [41] Stein, E. M. (1970). Singular Integrals and Differentiability Properties of Functions. Princeton Mathematical Series 30. Princeton Univ. Press, Princeton, NJ.
  • [42] Stone, C. J. (1984). An asymptotically optimal window selection rule for kernel density estimates. Ann. Statist. 12 1285–1297.
  • [43] Tokdar, S. T. and Ghosh, J. K. (2005). Posterior consistency of Gaussian process priors in density estimation. J. Statist. Plann. Inference 137 34–42.
  • [44] Tomczak-Jaegermann, N. (1987). Dualité des nombres d’entropie pour des opérateurs à valeurs dans un espace de Hilbert. C. R. Acad. Sci. Paris Sér. I Math. 305 299–301.
  • [45] van der Vaart, A. W. and Wellner, J. A. (1996). Weak Convergence and Empirical Processes. Springer, New York.
  • [46] van der Vaart, A. W. and van Zanten, J. H. (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist. 36.
  • [47] van der Vaart, A. W. and van Zanten, J. H. (2007). Bayesian inference with rescaled Gaussian process priors. Electron. J. Stat. 1 433–448.
  • [48] van der Vaart, A. W. and van Zanten, J. H. (2008). Reproducing kernel Hilbert spaces of Gaussian priors. IMS Collections 3 200–222.
  • [49] Wahba, G. (1978). Improper priors, spline smoothing and the problem of guarding against model errors in regression. J. Roy. Statist. Soc. Ser. B 40 364–372.
  • [50] Whitney, H. (1934). Analytic extensions of differentiable functions defined in closed sets. Trans. Amer. Math. Soc. 36 63–89.