## Electronic Journal of Statistics

### Recursive partitioning and multi-scale modeling on conditional densities

Li Ma

#### Abstract

We introduce a nonparametric prior on the conditional distribution of a (univariate or multivariate) response given a set of predictors. The prior is constructed in the form of a two-stage generative procedure, which in the first stage recursively partitions the predictor space, and then in the second stage generates the conditional distribution by a multi-scale nonparametric density model on each predictor partition block generated in the first stage. This design allows adaptive smoothing on both the predictor space and the response space, and it results in the full posterior conjugacy of the model, allowing exact Bayesian inference to be completed analytically through a forward-backward recursive algorithm without the need of MCMC, and thus enjoying high computational efficiency (scaling linearly with the sample size). We show that this prior enjoys desirable theoretical properties such as full $L_{1}$ support and posterior consistency. We illustrate how to apply the model to a variety of inference problems such as conditional density estimation as well as hypothesis testing and model selection in a manner similar to applying a parametric conjugate prior, while attaining full nonparametricity. Also provided is a comparison to two other state-of-the-art Bayesian nonparametric models for conditional densities in both model fit and computational time. A real data example from flow cytometry containing 455,472 observations is given to illustrate the substantial computational efficiency of our method and its application to multivariate problems.

#### Article information

Source
Electron. J. Statist., Volume 11, Number 1 (2017), 1297-1325.

Dates
First available in Project Euclid: 14 April 2017

https://projecteuclid.org/euclid.ejs/1492135235

Digital Object Identifier
doi:10.1214/17-EJS1254

Mathematical Reviews number (MathSciNet)
MR3635914

Zentralblatt MATH identifier
1362.62117

Subjects
Primary: 62F15: Bayesian inference 62G99: None of the above, but in this section
Secondary: 62G07: Density estimation

#### Citation

Ma, Li. Recursive partitioning and multi-scale modeling on conditional densities. Electron. J. Statist. 11 (2017), no. 1, 1297--1325. doi:10.1214/17-EJS1254. https://projecteuclid.org/euclid.ejs/1492135235

#### References

• [1] Barrientos, A. F., Jara, A., and Quintana, F. A. (0). Fully nonparametric regression for bounded data using dependent bernstein polynomials., Journal of the American Statistical Association 0, ja, 1–54. http://dx.doi.org/10.1080/01621459.2016.1180987.
• [2] Bashtannyk, D. M. and Hyndman, R. J. (2001). Bandwidth selection for kernel conditional density estimation., Computational Statistics & Data Analysis 36, 279–298.
• [3] Chipman, H. A., George, E. I., and McCulloch, R. E. (1998). Bayesian CART model search., Journal of the American Statistical Association 93, 443, 935–948.
• [4] Chung, Y. and Dunson, D. B. (2009). Nonparametric Bayes conditional distribution modeling with variable selection., Journal of The American Statistical Association 104, 1646–1660.
• [5] De Iorio, M., Johnson, W. O., Müller, P., and Rosner, G. L. (2009). Bayesian nonparametric nonproportional hazards survival modeling., Biometrics 65, 3, 762–771. http://dx.doi.org/10.1111/j.1541-0420.2008.01166.x.
• [6] Denison, D. G. T., Mallick, B. K., and Smith, A. F. M. (1998). A Bayesian CART algorithm., Biometrika 85, 2, 363–377.
• [7] Dunson, D. B. and Park, J.-H. (2008). Kernel stick-breaking processes., Biometrika 95, 307–323.
• [8] Efromovich, S. (2007). Conditional density estimation in a regression setting., Ann. Statist. 35, 6 (12), 2504–2535. http://dx.doi.org/10.1214/009053607000000253.
• [9] Efromovich, S. (2010). Dimension reduction and adaptation in conditional density estimation., Journal of the American Statistical Association 105, 490, 761–774.
• [10] Fan, J., Yao, Q., and Tong, H. (1996). Estimation of conditional densities and sensitivity measures in nonlinear dynamical systems., Biometrika 83, 189–206.
• [11] Fan, J. and Yim, T. H. (2004). A crossvalidation method for estimating conditional densities., Biometrika 91, 4 (Dec.), 819–834. http://dx.doi.org/10.1093/biomet/91.4.819.
• [12] Ferguson, T. S. (1973). A Bayesian analysis of some nonparametric problems., Ann. Statist. 1, 209–230.
• [13] Gelfand, A. E., Kottas, A., and MacEachern, S. N. (2005). Bayesian nonparametric spatial modeling with Dirichlet process mixing., Journal of the American Statistical Association 100, 1021–1035.
• [14] Ghosh, J. K. and Ramamoorthi, R. V. (2003)., Bayesian Nonparametrics. Springer Series in Statistics. Springer-Verlag, New York.
• [15] Griffin, J. and Steel, M. (2006). Order-based dependent Dirichlet processes., Journal of the American Statistical Association 101, 179–194.
• [16] Hall, P., Wolff, R. C., and Yao, Q. (1999). Methods for estimating a conditional distribution function., Journal of the American Statistical Association 94, 445, 154–163. http://eprints.qut.edu.au/5939/.
• [17] Hanson, T. and Johnson, W. O. (2002). Modeling regression error with a mixture of pólya trees., Journal of the American Statistical Association 97, 460.
• [18] Hanson, T. E. (2006). Inference for mixtures of finite pólya tree models., Journal of the American Statistical Association 101, 476.
• [19] Hyndman, R. J. and Yao, Q. (2002). Nonparametric estimation and symmetry tests for conditional density functions., Nonpara. Statist 14, 259–278.
• [20] Iorio, M. D., Rosner, P., and MacEachern, S. N. (2004). An anova model for dependent random measures., Journal of The American Statistical Association 99, 205–215.
• [21] Jara, A. and Hanson, T. E. (2011). A class of mixtures of dependent tail-free processes., Biometrika 98, 3, 553–566.
• [22] Lavine, M. (1992). Some aspects of Pólya tree distributions for statistical modelling., Ann. Statist. 20, 3, 1222–1235.
• [23] Lenk, P. J. (1988). The logistic normal distribution for Bayesian, nonparametric, predictive densities., Journal of the American Statistical Association 83, 402, 509–516. http://dx.doi.org/10.2307/2288870.
• [24] Lijoi, A., Nipoti, B., and Prünster, I. (2014). Bayesian inference with dependent normalized completely random measures., Bernoulli 20, 3 (08), 1260–1291. http://dx.doi.org/10.3150/13-BEJ521.
• [25] Liu, J. S. (2001)., Monte Carlo Strategies in Scientific Computing. Springer.
• [26] Ma, L. (2015). Scalable bayesian model averaging through local information propagation., Journal of the American Statistical Association 110, 510, 795–809.
• [27] Ma, L. (2016). Adaptive shrinkage in Pólya tree type models., Bayesian Analysis. http://projecteuclid.org/euclid.ba/1473276260.
• [28] Ma, L. and Wong, W. H. (2011). Coupling optional Pólya trees and the two sample problem., Journal of the American Statistical Association 106, 496, 1553–1565.
• [29] MacEachern, S. (1999). Dependent Dirichlet processes. In, Proceedings of the section on Bayesian Statistical Science.
• [30] Malek, M., Taghiyar, M. J., Chong, L., Finak, G., Gottardo, R., and Brinkman, R. R. (2014). flowdensity: Reproducing manual gating of flow cytometry data by automated density-based cell population identification., Bioinformatics.
• [31] Mauldin, R. D., Sudderth, W. D., and Williams, S. C. (1992). Pólya trees and random distributions., Ann. Statist. 20, 3, 1203–1221.
• [32] Müller, P., Erkanli, A., and West, M. (1996). Bayesian curve fitting using multivariate normal mixtures., Biometrika 83, 1 (Mar.), 67–79. http://dx.doi.org/10.1093/biomet/83.1.67.
• [33] Norets, A. and Pelenis, J. (2012). Bayesian modeling of joint and conditional distributions., Journal of Econometrics 168, 2, 332–346. http://www.sciencedirect.com/science/article/pii/S0304407612000577.
• [34] Norets, A. and Pelenis, J. (2014). Posterior consistency in conditional density estimation by covariate dependent mixtures., Econometric Theory 30, 3 (006), 606–646. https://www.cambridge.org/core/article/div-class-title-posterior-consistency-in-conditional-density-estimation-by-covariate-dependent-mixtures-div/68481163FABF988BDBF92699F00F22DE.
• [35] Pati, D., Dunson, D., and Tokdar, S. (2011). Posterior consistency in conditional distribution estimation. Tech. rep., Duke University Department of Statistical, Science.
• [36] Rodríguez, A. and Dunson, D. B. (2011). Nonparametric Bayesian models through probit stick-breaking processes., Bayesian Analysis 6, 1, 145–178.
• [37] Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals., Journal of the American Statistical Association 92, 894–902.
• [38] Schwartz, L. (1965). On Bayes procedures., Z. Wahrscheinlichkeitstheorie und Verw. Gebiete 4, 10–26.
• [39] Sethuraman, J. (1994). A constructive definition of Dirichlet priors., Statistica Sinica 4, 639–650.
• [40] Shen, W. and Ghosal, S. (2016). Adaptive bayesian density regression for high-dimensional data., Bernoulli 22, 1 (02), 396–420. http://dx.doi.org/10.3150/14-BEJ663.
• [41] Taddy, M. A. and Kottas, A. (2010). A Bayesian nonparametric approach to inference for quantile regression., Journal of Business & Economic Statistics 28, 3, 357–369. http://econpapers.repec.org/RePEc:bes:jnlbes:v:28:i:3:y:2010:p:357-369.
• [42] Tokdar, S. T. and Ghosh, J. K. (2007). Posterior consistency of logistic Gaussian process priors in density estimation., Journal of Statistical Planning and Inference 137, 1 (Jan.), 34–42. http://dx.doi.org/10.1016/j.jspi.2005.09.005.
• [43] Tokdar, S. T., Zhu, Y. M., and Ghosh, J. K. (2010). Bayesian density regression with logistic gaussian process and subspace projection., Bayesian analysis 5, 2, 319–344.
• [44] Trippa, L., Müller, P., and Johnson, W. (2011). The multivariate beta process and an extension of the polya tree model., Biometrika 98, 1, 17–34. http://ideas.repec.org/a/oup/biomet/v98y2011i1p17-34.html.
• [45] Wong, W. H. and Ma, L. (2010). Optional Pólya tree and Bayesian inference., Annals of Statistics 38, 3, 1433–1459. http://projecteuclid.org/euclid.aos/1268056622.