## Bayesian Analysis

### Adaptive Bayesian Nonparametric Regression Using a Kernel Mixture of Polynomials with Application to Partial Linear Models

#### Abstract

We propose a kernel mixture of polynomials prior for Bayesian nonparametric regression. The regression function is modeled by local averages of polynomials with kernel mixture weights. We obtain the minimax-optimal contraction rate of the full posterior distribution up to a logarithmic factor by estimating metric entropies of certain function classes. Under the assumption that the degree of the polynomials is larger than the unknown smoothness level of the true function, the posterior contraction behavior can adapt to this smoothness level provided an upper bound is known. We also provide a frequentist sieve maximum likelihood estimator with a near-optimal convergence rate. We further investigate the application of the kernel mixture of polynomials to partial linear models and obtain both the near-optimal rate of contraction for the nonparametric component and the Bernstein-von Mises limit (i.e., asymptotic normality) of the parametric component. The proposed method is illustrated with numerical examples and shows superior performance in terms of computational efficiency, accuracy, and uncertainty quantification compared to the local polynomial regression, DiceKriging, and the robust Gaussian stochastic process.

#### Article information

Source
Bayesian Anal., Advance publication (2018), 28 pages.

Dates
First available in Project Euclid: 22 February 2019

https://projecteuclid.org/euclid.ba/1550826222

Digital Object Identifier
doi:10.1214/19-BA1148

#### Citation

Xie, Fangzheng; Xu, Yanxun. Adaptive Bayesian Nonparametric Regression Using a Kernel Mixture of Polynomials with Application to Partial Linear Models. Bayesian Anal., advance publication, 22 February 2019. doi:10.1214/19-BA1148. https://projecteuclid.org/euclid.ba/1550826222

#### References

• Affandi, R. H., Fox, E., and Taskar, B. (2013). “Approximate inference in continuous determinantal processes.” In Advances in Neural Information Processing Systems, 1430–1438.
• Bhattacharya, A., Pati, D., and Dunson, D. (2014). “Anisotropic function estimation using multi-bandwidth Gaussian processes.” Annals of Statistics, 42(1): 352.
• Bhattacharya, A., Pati, D., Pillai, N. S., and Dunson, D. B. (2015). “Dirichlet-Laplace priors for optimal shrinkage.” Journal of the American Statistical Association, 110(512): 1479–1490.
• Bickel, P., Kleijn, B., et al. (2012). “The semiparametric Bernstein-von Mises theorem.” The Annals of Statistics, 40(1): 206–237.
• Bickel, P. J., Klaassen, C. A., Ritov, Y., Wellner, J. A., et al. (1998). “Efficient and adaptive estimation for semiparametric models.”
• Cabrera, J. (2012). “locpol: Kernel local polynomial regression. R package version 0.4-0.”
• Castillo, I. and van der Vaart, A. (2012). “Needles and straw in a haystack: Posterior concentration for possibly sparse sequences.” The Annals of Statistics, 40(4): 2069–2101.
• Celeux, G., Hurn, M., and Robert, C. P. (2000). “Computational and inferential difficulties with mixture posterior distributions.” Journal of the American Statistical Association, 95(451): 957–970.
• Chen, H. et al. (1988). “Convergence rates for parametric components in a partly linear model.” The Annals of Statistics, 16(1): 136–146.
• Choi, T. and Woo, Y. (2015). “A partially linear model using a Gaussian process prior.” Communications in Statistics-Simulation and Computation, 44(7): 1770–1786.
• De Jonge, R., Van Zanten, J., et al. (2010). “Adaptive nonparametric Bayesian inference using location-scale mixture priors.” The Annals of Statistics, 38(6): 3300–3320.
• De Jonge, R., Van Zanten, J., et al. (2012). “Adaptive estimation of multivariate functions using conditionally Gaussian tensor-product spline priors.” Electronic Journal of Statistics, 6: 1984–2001.
• Devroye, L., Györfi, L., and Lugosi, G. (2013). A probabilistic theory of pattern recognition, volume 31. Springer Science & Business Media.
• Doob, J. L. (1949). “Application of the theory of martingales.” Le calcul des probabilites et ses applications, 23–27.
• Engle, R. F., Granger, C. W., Rice, J., and Weiss, A. (1986). “Semiparametric estimates of the relation between weather and electricity sales.” Journal of the American statistical Association, 81(394): 310–320.
• Fan, J. and Gijbels, I. (1996). Local polynomial modelling and its applications: monographs on statistics and applied probability 66, volume 66. CRC Press.
• Fan, Y. and Li, Q. (1999). “Root-n-consistent estimation of partially linear time series models.” Journal of Nonparametric Statistics, 11(1-3): 251–269.
• Friedman, J. H. and Stuetzle, W. (1981). “Projection pursuit regression.” Journal of the American statistical Association, 76(376): 817–823.
• Gao, C. and Zhou, H. H. (2015). “Rate-optimal posterior contraction for sparse PCA.” The Annals of Statistics, 43(2): 785–818.
• Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A., and Rubin, D. B. (2014). Bayesian data analysis, volume 2. CRC press Boca Raton, FL.
• Ghosal, S., Ghosh, J. K., and van der Vaart, A. W. (2000). “Convergence rates of posterior distributions.” Annals of Statistics, 28(2): 500–531.
• Ghosal, S. and van der Vaart, A. (2017). Fundamentals of nonparametric Bayesian inference, volume 44. Cambridge University Press.
• Ghosal, S., van der Vaart, A., et al. (2007a). “Convergence rates of posterior distributions for non-i.i.d observations.” The Annals of Statistics, 35(1): 192–223.
• Ghosal, S., van der Vaart, A., et al. (2007b). “Posterior convergence rates of Dirichlet mixtures at smooth densities.” The Annals of Statistics, 35(2): 697–723.
• Ghosal, S. and van der Vaart, A. W. (2001). “Entropies and rates of convergence for maximum likelihood and Bayes estimation for mixtures of normal densities.” Annals of Statistics, 29(5): 1233–1263.
• Gu, M., Palomo, J., and Berger, J. O. (2016). “Robust GaSP: an R Package for objective Bayesian emulation of complex computer model codes.” Technical Report.
• Gu, M., Wang, X., and Berger, J. O. (2017). “Robust Gaussian Stochastic Process Emulation.” arXiv preprint arXiv:1708.04738.
• Györfi, L., Kohler, M., Krzyzak, A., and Walk, H. (2006). A distribution-free theory of nonparametric regression. Springer Science & Business Media.
• Hastie, T. and Tibshirani, R. (1990). Generalized additive models. Wiley Online Library.
• Hayfield, T., Racine, J. S., et al. (2008). “Nonparametric econometrics: The np package.” Journal of Statistical Software, 27(5): 1–32.
• Ichimura, H. (1993). “Semiparametric least squares (SLS) and weighted SLS estimation of single-index models.” Journal of Econometrics, 58(1-2): 71–120.
• Jasra, A., Holmes, C. C., and Stephens, D. A. (2005). “Markov chain Monte Carlo methods and the label switching problem in Bayesian mixture modeling.” Statistical Science, 50–67.
• Klein, R. W. and Spady, R. H. (1993). “An efficient semiparametric estimator for binary response models.” Econometrica: Journal of the Econometric Society, 387–421.
• Knapik, B. T., van der Vaart, A. W., van Zanten, J. H., et al. (2011). “Bayesian inverse problems with Gaussian priors.” The Annals of Statistics, 39(5): 2626–2657.
• Kruijer, W., Rousseau, J., van der Vaart, A., et al. (2010). “Adaptive Bayesian density estimation with location-scale mixtures.” Electronic Journal of Statistics, 4: 1225–1257.
• Lenk, P. J. (1999). “Bayesian inference for semiparametric regression using a Fourier representation.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 61(4): 863–879.
• Nadaraya, E. A. (1964). “On estimating regression.” Theory of Probability & Its Applications, 9(1): 141–142.
• Nocedal, J. and Wright, S. (2006). Numerical optimization. Springer Science & Business Media.
• Pati, D., Bhattacharya, A., Pillai, N. S., and Dunson, D. (2014). “Posterior contraction in sparse Bayesian factor models for massive covariance matrices.” The Annals of Statistics, 42(3): 1102–1130.
• Plumlee, M. (2017). “Bayesian calibration of inexact computer models.” Journal of the American Statistical Association, 1–12.
• Plumlee, M. and Joseph, V. R. (2016). “Orthogonal Gaussian process models.” arXiv preprint arXiv:1611.00203.
• Rasmussen, C. E. and Williams, C. K. (2006). Gaussian processes for machine learning, volume 1. MIT press Cambridge.
• Robinson, P. M. (1988). “Root-N-consistent semiparametric regression.” Econometrica: Journal of the Econometric Society, 931–954.
• Ročková, V. (2018). “Bayesian estimation of sparse signals with a continuous spike-and-slab prior.” The Annals of Statistics, 46(1): 401–437.
• Rockova, V. and van der Pas, S. (2017). “Posterior concentration for Bayesian regression trees and their ensembles.” arXiv preprint arXiv:1708.08734.
• Roustant, O., Ginsbourger, D., and Deville, Y. (2012). “DiceKriging, DiceOptim: Two R packages for the analysis of computer experiments by kriging-based metamodelling and optimization.” Journal of Statistical Software, 51(1): 54p.
• Shen, W., Tokdar, S. T., and Ghosal, S. (2013). “Adaptive Bayesian multivariate density estimation with Dirichlet mixtures.” Biometrika, 100(3): 623–640.
• Shen, X. and Wong, W. H. (1994). “Convergence rate of sieve estimates.” The Annals of Statistics, 580–615.
• Speckman, P. (1988). “Kernel smoothing in partial linear models.” Journal of the Royal Statistical Society. Series B (Methodological), 413–436.
• Stone, C. J. (1982). “Optimal global rates of convergence for nonparametric regression.” The annals of statistics, 1040–1053.
• Szabó, B., van der Vaart, A. W., and van Zanten, J. H. (2015). “Frequentist coverage of adaptive nonparametric Bayesian credible sets.” Ann. Statist., 43(4): 1391–1428.
• Takeda, H., Farsiu, S., and Milanfar, P. (2007). “Kernel regression for image processing and reconstruction.” IEEE Transactions on image processing, 16(2): 349–366.
• Tang, Y., Sinha, D., Pati, D., Lipsitz, S., and Lipshultz, S. (2015). “Bayesian partial linear model for skewed longitudinal data.” Biostatistics, 16(3): 441–453.
• Tuo, R. and Wu, C. J. (2015). “Efficient calibration for imperfect computer models.” The Annals of Statistics, 43(6): 2331–2352.
• van der Vaart, A. and van Zanten, H. (2007). “Bayesian inference with rescaled Gaussian process priors.” Electronic Journal of Statistics, 1: 433–448.
• van der Vaart, A. and Zanten, H. v. (2011). “Information rates of nonparametric Gaussian process methods.” Journal of Machine Learning Research, 12(Jun): 2095–2119.
• van der Vaart, A. W. and van Zanten, J. H. (2008). “Rates of contraction of posterior distributions based on Gaussian process priors.” The Annals of Statistics, 1435–1463.
• van der Vaart, A. W. and van Zanten, J. H. (2009). “Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth.” The Annals of Statistics, 2655–2675.
• Watson, G. S. (1964). “Smooth regression analysis.” Sankhyā: The Indian Journal of Statistics, Series A, 359–372.
• Wooldridge, J. M. (2015). Introductory econometrics: A modern approach. Nelson Education.
• Xie, F., Jin, W., and Xu, Y. (2017). “A Theoretical Framework for Bayesian Nonparametric Regression: Orthonormal Random Series and Rates of Contraction.” arXiv preprint arXiv:1712.05731.
• Xie, F. and Xu, Y. (2017). “Bayesian Repulsive Gaussian Mixture Model.” arXiv preprint arXiv:1703.09061.
• Xie, F. and Xu, Y. (2019). “Supplementary Material for “Adaptive Bayesian Nonparametric Regression Using a Kernel Mixture of Polynomials with Application to Partial Linear Models”.” Bayesian Analysis.
• Xu, Y., Mueller, P., and Telesca, D. (2016a). “Bayesian inference for latent biologic structure with determinantal point processes (DPP).” Biometrics, 72(3): 955–964.
• Xu, Y., Xu, Y., and Saria, S. (2016b). “A Bayesian Nonparametric Approach for Estimating Individualized Treatment-Response Curves.” In Machine Learning for Healthcare Conference, 282–300.
• Yang, Y., Bhattacharya, A., and Pati, D. (2017). “Frequentist coverage and sup-norm convergence rate in Gaussian process regression.” arXiv preprint arXiv:1708.04753.
• Yang, Y., Cheng, G., and Dunson, D. B. (2015). “Semiparametric Bernstein-von Mises Theorem: Second Order Studies.” arXiv preprint arXiv:1503.04493.
• Yoo, W. W., Ghosal, S., et al. (2016). “Supremum norm posterior contraction and credible sets for nonparametric multivariate regression.” The Annals of Statistics, 44(3): 1069–1102.

#### Supplemental materials

• Supplementary Material for “Adaptive Bayesian Nonparametric Regression Using a Kernel Mixture of Polynomials with Application to Partial Linear Models”. The supplementary material contains additional notations, proofs for Section 3, Section 4, posterior contraction for unknown $\sigma^2$ discussed in Section 6 and its proof, and cited theorems and results.