• Bernoulli
  • Volume 22, Number 1 (2016), 396-420.

Adaptive Bayesian density regression for high-dimensional data

Weining Shen and Subhashis Ghosal

Full-text: Open access


Density regression provides a flexible strategy for modeling the distribution of a response variable $Y$ given predictors $\mathbf{X}=(X_{1},\ldots,X_{p})$ by letting that the conditional density of $Y$ given $\mathbf{X}$ as a completely unknown function and allowing its shape to change with the value of $\mathbf{X}$. The number of predictors $p$ may be very large, possibly much larger than the number of observations $n$, but the conditional density is assumed to depend only on a much smaller number of predictors, which are unknown. In addition to estimation, the goal is also to select the important predictors which actually affect the true conditional density. We consider a nonparametric Bayesian approach to density regression by constructing a random series prior based on tensor products of spline functions. The proposed prior also incorporates the issue of variable selection. We show that the posterior distribution of the conditional density contracts adaptively at the truth nearly at the optimal oracle rate, determined by the unknown sparsity and smoothness levels, even in the ultra high-dimensional settings where $p$ increases exponentially with $n$. The result is also extended to the anisotropic case where the degree of smoothness can vary in different directions, and both random and deterministic predictors are considered. We also propose a technique to calculate posterior moments of the conditional density function without requiring Markov chain Monte Carlo methods.

Article information

Bernoulli, Volume 22, Number 1 (2016), 396-420.

Received: July 2013
Revised: June 2014
First available in Project Euclid: 30 September 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

adaptive estimation density regression high-dimensional models MCMC-free computation nonparametric Bayesian inference posterior contraction rate variable selection


Shen, Weining; Ghosal, Subhashis. Adaptive Bayesian density regression for high-dimensional data. Bernoulli 22 (2016), no. 1, 396--420. doi:10.3150/14-BEJ663.

Export citation


  • [1] Birgé, L. (1986). On estimating a density using Hellinger distance and some other strange facts. Probab. Theory Related Fields 71 271–291.
  • [2] Brown, P.J., Vannucci, M. and Fearn, T. (1998). Multivariate Bayesian variable selection and prediction. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 627–641.
  • [3] Brown, P.J., Vannucci, M. and Fearn, T. (2002). Bayes model averaging with selection of regressors. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 519–536.
  • [4] Bühlmann, P. and van de Geer, S. (2011). Statistics for High-Dimensional Data: Methods, Theory and Applications. Springer Series in Statistics. Heidelberg: Springer.
  • [5] Carlin, B.P. and Chib, S. (1995). Bayesian model choice via Markov chain Monte Carlo methods. J. R. Stat. Soc. Ser. B Stat. Methodol. 57 473–484.
  • [6] Castillo, I. and van der Vaart, A. (2012). Needles and straw in a haystack: Posterior concentration for possibly sparse sequences. Ann. Statist. 40 2069–2101.
  • [7] Chung, Y. and Dunson, D.B. (2009). Nonparametric Bayes conditional distribution modeling with variable selection. J. Amer. Statist. Assoc. 104 1646–1660.
  • [8] Dalalyan, A.S. and Tsybakov, A.B. (2012). Mirror averaging with sparsity priors. Bernoulli 18 914–944.
  • [9] de Boor, C. (2001). A Practical Guide to Splines, Revised ed. Applied Mathematical Sciences 27. New York: Springer.
  • [10] de Jonge, R. and van Zanten, J.H. (2012). Adaptive estimation of multivariate functions using conditionally Gaussian tensor-product spline priors. Electron. J. Stat. 6 1984–2001.
  • [11] Dunson, D.B. and Park, J.-H. (2008). Kernel stick-breaking processes. Biometrika 95 307–323.
  • [12] Dunson, D.B., Pillai, N. and Park, J.-H. (2007). Bayesian density regression. J. R. Stat. Soc. Ser. B Stat. Methodol. 69 163–183.
  • [13] Fan, J. and Lv, J. (2008). Sure independence screening for ultrahigh dimensional feature space. J. R. Stat. Soc. Ser. B Stat. Methodol. 70 849–911.
  • [14] Fan, J., Yao, Q. and Tong, H. (1996). Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems. Biometrika 83 189–206.
  • [15] Fan, J. and Yim, T.H. (2004). A crossvalidation method for estimating conditional densities. Biometrika 91 819–834.
  • [16] George, E.I. and McCulloch, R.E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
  • [17] George, E.I. and McCulloch, R.E. (1997). Approaches for Bayesian variable selection. Statist. Sinica 7 339–373.
  • [18] Ghosal, S., Ghosh, J.K. and van der Vaart, A.W. (2000). Convergence rates of posterior distributions. Ann. Statist. 28 500–531.
  • [19] Ghosal, S. and van der Vaart, A. (2007). Convergence rates of posterior distributions for non-i.i.d. observations. Ann. Statist. 35 192–223.
  • [20] Ghosal, S. and van der Vaart, A. (2007). Posterior convergence rates of Dirichlet mixtures at smooth densities. Ann. Statist. 35 697–723.
  • [21] Green, P.J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
  • [22] Greenshtein, E. and Ritov, Y. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. Bernoulli 10 971–988.
  • [23] Griffin, J.E. and Steel, M.F.J. (2006). Order-based dependent Dirichlet processes. J. Amer. Statist. Assoc. 101 179–194.
  • [24] Hall, P., Racine, J. and Li, Q. (2004). Cross-validation and the estimation of conditional probability densities. J. Amer. Statist. Assoc. 99 1015–1026.
  • [25] Hall, P., Wolff, R.C.L. and Yao, Q. (1999). Methods for estimating a conditional distribution function. J. Amer. Statist. Assoc. 94 154–163.
  • [26] Huang, J., Horowitz, J.L. and Wei, F. (2010). Variable selection in nonparametric additive models. Ann. Statist. 38 2282–2313.
  • [27] Jara, A. and Hanson, T.E. (2011). A class of mixtures of dependent tail-free processes. Biometrika 98 553–566.
  • [28] Jiang, W. (2007). Bayesian variable selection for high dimensional generalized linear models: Convergence rates of the fitted densities. Ann. Statist. 35 1487–1511.
  • [29] Khasminskii, R.Z. (1978). A lower bound for risks of nonparametric density estimates in the uniform metric. Theory Probab. Appl. 23 794–798.
  • [30] Lepski, O. (2013). Multivariate density estimation under sup-norm loss: Oracle approach, adaptation and independence structure. Ann. Statist. 41 1005–1034.
  • [31] Ma, L. (2012). Recursive partitioning and Bayesian inference on conditional distributions. Technical report. Duke Univ.
  • [32] Müller, P., Erkanli, A. and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures. Biometrika 83 67–79.
  • [33] Norets, A. and Pelenis, J. (2014). Posterior consistency in conditional density estimation by covariate dependent mixtures. Econometric Theory 30 606–646.
  • [34] Pati, D., Dunson, D.B. and Tokdar, S.T. (2013). Posterior consistency in conditional distribution estimation. J. Multivariate Anal. 116 456–472.
  • [35] Rivoirard, V. and Rousseau, J. (2012). Posterior concentration rates for infinite dimensional exponential families. Bayesian Anal. 7 311–333.
  • [36] Schumaker, L.L. (2007). Spline Functions: Basic Theory, 3rd ed. Cambridge Mathematical Library. Cambridge: Cambridge Univ. Press.
  • [37] Shen, W. and Ghosal, S. (2012). Adaptive Bayesian procedures using random series prior. Technical report. Available at arXiv:1403.0625.
  • [38] Shen, W., Tokdar, S.T. and Ghosal, S. (2013). Adaptive Bayesian multivariate density estimation with Dirichlet mixtures. Biometrika 100 623–640.
  • [39] Shen, X. and Wasserman, L. (2001). Rates of convergence of posterior distributions. Ann. Statist. 29 687–714.
  • [40] Sugiyama, M., Takeuchi, I., Kanamori, T., Suzuki, T., Hachiya, H. and Okanohara, D. (2010). Least-squares conditional density estimation. IEICE Trans. Inf. Syst. E93-D 583–594.
  • [41] Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • [42] Tokdar, S.T. (2011). Dimension adaptability of Gaussian process models with variable selection and projection. Technical report. Available at arXiv:1112.0716.
  • [43] Tokdar, S.T., Zhu, Y.M. and Ghosh, J.K. (2010). Bayesian density regression with logistic Gaussian process and subspace projection. Bayesian Anal. 5 319–344.
  • [44] Trippa, L., Müller, P. and Johnson, W. (2011). The multivariate beta process and an extension of the Polya tree model. Biometrika 98 17–34.
  • [45] van de Geer, S.A. (2008). High-dimensional generalized linear models and the lasso. Ann. Statist. 36 614–645.
  • [46] van der Vaart, A.W. and van Zanten, J.H. (2008). Rates of contraction of posterior distributions based on Gaussian process priors. Ann. Statist. 36 1435–1463.
  • [47] Yang, Y. and Dunson, D. (2012). Bayesian conditional tensor factorizations for high-dimensional classification. Technical report. Available at arXiv:1301.4950.