BART: Bayesian additive regression trees

Hugh A. Chipman; Edward I. George; Robert E. McCulloch

doi:10.1214/09-AOAS285

March 2010 BART: Bayesian additive regression trees

Hugh A. Chipman, Edward I. George, Robert E. McCulloch

Ann. Appl. Stat. 4(1): 266-298 (March 2010). DOI: 10.1214/09-AOAS285

Abstract

We develop a Bayesian “sum-of-trees” model where each tree is constrained by a regularization prior to be a weak learner, and fitting and inference are accomplished via an iterative Bayesian backfitting MCMC algorithm that generates samples from a posterior. Effectively, BART is a nonparametric Bayesian regression approach which uses dimensionally adaptive random basis elements. Motivated by ensemble methods in general, and boosting algorithms in particular, BART is defined by a statistical model: a prior and a likelihood. This approach enables full posterior inference including point and interval estimates of the unknown regression function as well as the marginal effects of potential predictors. By keeping track of predictor inclusion frequencies, BART can also be used for model-free variable selection. BART’s many features are illustrated with a bake-off against competing methods on 42 different data sets, with a simulation experiment and on a drug discovery classification problem.

Citation

Download Citation

Hugh A. Chipman. Edward I. George. Robert E. McCulloch. "BART: Bayesian additive regression trees." Ann. Appl. Stat. 4 (1) 266 - 298, March 2010. https://doi.org/10.1214/09-AOAS285

Information

Published: March 2010

First available in Project Euclid: 11 May 2010

zbMATH: 1189.62066

MathSciNet: MR2758172

Digital Object Identifier: 10.1214/09-AOAS285

Keywords: Bayesian backfitting , boosting , CART , ‎classification‎ , ensemble , MCMC , Nonparametric regression , probit model , random basis , regularizatio , sum-of-trees model , Variable selection , weak learner

Access the abstract

JOURNAL ARTICLE
33 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY