The Annals of Statistics

Anisotropic function estimation using multi-bandwidth Gaussian processes

Anirban Bhattacharya, Debdeep Pati, and David Dunson

Full-text: Open access

Abstract

In nonparametric regression problems involving multiple predictors, there is typically interest in estimating an anisotropic multivariate regression surface in the important predictors while discarding the unimportant ones. Our focus is on defining a Bayesian procedure that leads to the minimax optimal rate of posterior contraction (up to a log factor) adapting to the unknown dimension and anisotropic smoothness of the true surface. We propose such an approach based on a Gaussian process prior with dimension-specific scalings, which are assigned carefully-chosen hyperpriors. We additionally show that using a homogenous Gaussian process with a single bandwidth leads to a sub-optimal rate in anisotropic cases. Zanten (2009) showed that rescaling a homogeneous smooth

Article information

Source
Ann. Statist. Volume 42, Number 1 (2014), 352-381.

Dates
First available in Project Euclid: 19 March 2014

Permanent link to this document
https://projecteuclid.org/euclid.aos/1395234981

Digital Object Identifier
doi:10.1214/13-AOS1192

Mathematical Reviews number (MathSciNet)
MR3189489

Zentralblatt MATH identifier
1360.62168

Subjects
Primary: 62G07: Density estimation 62G20: Asymptotic properties
Secondary: 60K35: Interacting random processes; statistical mechanics type models; percolation theory [See also 82B43, 82C43]

Keywords
Adaptive anisotropic Bayesian nonparametrics function estimation Gaussian process rate of convergence

Citation

Bhattacharya, Anirban; Pati, Debdeep; Dunson, David. Anisotropic function estimation using multi-bandwidth Gaussian processes. Ann. Statist. 42 (2014), no. 1, 352--381. doi:10.1214/13-AOS1192. https://projecteuclid.org/euclid.aos/1395234981


Export citation

References

  • Abramowitz, M. and Stegun, I. (1992). Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables. Dover, New York.
  • Barbieri, M. M. and Berger, J. O. (2004). Optimal predictive model selection. Ann. Statist. 32 870–897.
  • Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization. Probab. Theory Related Fields 113 301–413.
  • Belitser, E. and Ghosal, S. (2003). Adaptive Bayesian inference on the mean of an infinite-dimensional normal distribution. Ann. Statist. 31 536–559.
  • Birgé, L. (1986). On estimating a density using Hellinger distance and some other strange facts. Probab. Theory Related Fields 71 271–291.
  • Birgé, L. (2001). An alternative point of view on Lepski’s method. In State of the Art in Probability and Statistics. IMS Lecture Notes Monogr. Ser. 36 113–133. IMS, Beachwood, OH.
  • Castillo, I. (2008). Lower bounds for posterior rates with Gaussian process priors. Electron. J. Stat. 2 1281–1299.
  • de Jonge, R. and van Zanten, J. H. (2010). Adaptive nonparametric Bayesian inference using location-scale mixture priors. Ann. Statist. 38 3300–3320.
  • Ghosal, S., Lember, J. and Van Der Vaart, A. (2003). On Bayesian adaptation. Acta Appl. Math. 79 165–175.
  • Ghosal, S., Lember, J. and van der Vaart, A. (2008). Nonparametric Bayesian model selection and averaging. Electron. J. Stat. 2 63–89.
  • Hoffmann, M. and Lepski, O. (2002). Random rates in anisotropic regression. Ann. Statist. 30 325–396.
  • Huang, T.-M. (2004). Convergence rates for posterior distributions and adaptive estimation. Ann. Statist. 32 1556–1593.
  • Ibragimov, I. A. and Hasminskiĭ, R. Z. (1981). Statistical Estimation. Springer, New York.
  • Johnson, S. G. (2007). Saddle-point integration of $C^\infty$ “bump” functions. Available at http://math.mit.edu/~Stevenj/bump-saddle.pdf.
  • Kerkyacharian, G., Lepski, O. and Picard, D. (2001). Nonlinear estimation in anisotropic multi-index denoising. Probab. Theory Related Fields 121 137–170.
  • Klutchnikoff, N. (2005). On the adaptive estimation of anisotropic functions. Ph.D. thesis, Univ. Aix–Marseille I.
  • Kruijer, W., Rousseau, J. and van der Vaart, A. (2010). Adaptive Bayesian density estimation with location-scale mixtures. Electron. J. Stat. 4 1225–1257.
  • Kuelbs, J. and Li, W. V. (1993). Metric entropy and the small ball problem for Gaussian measures. J. Funct. Anal. 116 133–157.
  • Lepski, O. V. and Levit, B. Y. (1999). Adaptive nonparametric estimation of smooth multivariate functions. Math. Methods Statist. 8 344–370.
  • Lepskiĭ, O. V. (1990). A problem of adaptive estimation in Gaussian white noise. Teor. Veroyatn. Primen. 35 459–470.
  • Lepskiĭ, O. V. (1991). Asymptotically minimax adaptive estimation. I. Upper bounds. Theory Probab. Appl. 36 645–659.
  • Lepskiĭ, O. V. (1992). Asymptotically minimax adaptive estimation. II. Schemes without optimal adaptation. Adaptive estimates. Theory Probab. Appl. 37 468–481.
  • Nussbaum, M. (1985). Spline smoothing in regression models and asymptotic efficiency in $L_2$. Ann. Statist. 13 984–997.
  • Rasmussen, C. E. (2004). Gaussian processes in machine learning. In Advanced Lectures on Machine Learning. Lect. Notes in Comput. Sci. 3176 63–71. Springer, Heidelberg.
  • Rousseau, J. (2010). Rates of convergence for the posterior distributions of mixtures of betas and adaptive nonparametric estimation of the density. Ann. Statist. 38 146–180.
  • Savitsky, T., Vannucci, M. and Sha, N. (2011). Variable selection for nonparametric Gaussian process priors: Models and computational strategies. Statist. Sci. 26 130–149.
  • Scricciolo, C. (2006). Convergence rates for Bayesian density estimation of infinite-dimensional exponential families. Ann. Statist. 34 2897–2920.
  • Shen, W., Tokdar, S. T. and Ghosal, S. (2013). Adaptive Bayesian multivariate density estimation with Dirichlet mixtures. Biometrica 100 623–640.
  • Stone, C. J. (1982). Optimal global rates of convergence for nonparametric regression. Ann. Statist. 10 1040–1053.
  • Tokdar, S. T. (2011). Dimension adaptability of Gaussian process models with variable selection and projection. Preprint. Available at arXiv:1112.0716.
  • Tu, L. W. (2011). An Introduction to Manifolds, 2nd ed. Springer, New York.
  • van der Vaart, A. W. and van Zanten, J. H. (2008a). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist. 36 1435–1463.
  • van der Vaart, A. W. and van Zanten, J. H. (2008b). Reproducing kernel Hilbert spaces of Gaussian priors. In Pushing the Limits of Contemporary Statistics: Contributions in Honor of Jayanta K. Ghosh. Inst. Math. Stat. Collect. 3 200–222. IMS, Beachwood, OH.
  • van der Vaart, A. W. and van Zanten, J. H. (2009). Adaptive Bayesian estimation using a Gaussian random field with inverse gamma bandwidth. Ann. Statist. 37 2655–2675.
  • van der Vaart, A. and van Zanten, H. (2011). Information rates of nonparametric Gaussian process methods. J. Mach. Learn. Res. 12 2095–2119.
  • Zou, F., Huang, H., Lee, S. and Hoeschele, I. (2010). Nonparametric Bayesian variable selection with applications to multiple quantitative trait loci mapping with epistasis and gene–environment interaction. Genetics 186 385.