Bayesian Analysis

Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects

P. Richard Hahn, Jared S. Murray, and Carlos M. Carvalho

Advance publication

This article is in its final form and can be cited using the date of online publication and the DOI.

Full-text: Open access


This paper presents a novel nonlinear regression model for estimating heterogeneous treatment effects, geared specifically towards situations with small effect sizes, heterogeneous effects, and strong confounding by observables. Standard nonlinear regression models, which may work quite well for prediction, have two notable weaknesses when used to estimate heterogeneous treatment effects. First, they can yield badly biased estimates of treatment effects when fit to data with strong confounding. The Bayesian causal forest model presented in this paper avoids this problem by directly incorporating an estimate of the propensity function in the specification of the response model, implicitly inducing a covariate-dependent prior on the regression function. Second, standard approaches to response surface modeling do not provide adequate control over the strength of regularization over effect heterogeneity. The Bayesian causal forest model permits treatment effect heterogeneity to be regularized separately from the prognostic effect of control variables, making it possible to informatively “shrink to homogeneity”. While we focus on observational data, our methods are equally useful for inferring heterogeneous treatment effects from randomized controlled experiments where careful regularization is somewhat less complicated but no less important. We illustrate these benefits via the reanalysis of an observational study assessing the causal effects of smoking on medical expenditures as well as extensive simulation studies.

Article information

Bayesian Anal., Advance publication (2020), 33 pages.

First available in Project Euclid: 31 January 2020

Permanent link to this document

Digital Object Identifier

Primary: 62-07: Data analysis 62J02: General nonlinear regression
Secondary: 62F15: Bayesian inference

Bayesian causal inference heterogeneous treatment effects predictor-dependent priors machine learning regression trees regularization shrinkage

Creative Commons Attribution 4.0 International License.


Hahn, P. Richard; Murray, Jared S.; Carvalho, Carlos M. Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects. Bayesian Anal., advance publication, 31 January 2020. doi:10.1214/19-BA1195.

Export citation


  • Athey, S., Tibshirani, J., Wager, S., et al. (2019). “Generalized random forests.” The Annals of Statistics, 47(2): 1148–1178.
  • Bang, H. and Robins, J. M. (2005). “Doubly robust estimation in missing data and causal inference models.” Biometrics, 61(4): 962–973.
  • Breiman, L. (2001). “Random forests.” Machine learning, 45(1): 5–32.
  • Carvalho, C. M., Polson, N. G, and Scott, J. G. (2010). “The horseshoe estimator for sparse signals.” Biometrika, 97(2): 465–480. Oxford University Press.
  • Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., et al. (2016). “Double machine learning for treatment and causal parameters.” arXiv preprint arXiv:1608.00060.
  • Chipman, H., George, E., and McCulloch, R. (1998). “Bayesian CART model search.” Journal of the American Statistical Association, 93(443): 935–948.
  • Chipman, H. A., George, E. I., and McCulloch, R. E. (2010). “BART: Bayesian additive regression trees.” The Annals of Applied Statistics, 266–298.
  • Dorie, V. and Hill, J. (2017). aciccomp2016: Atlantic Causal Inference Conference Competition 2016 Simulation. R package version 0.1-0.
  • Dorie, V., Hill, J., Shalit, U., Scott, M., Cervone, D., et al. (2019). “Automated versus do-it-yourself methods for causal inference: Lessons learned from a data analysis competition.” Statistical Science, 34(1): 43–68.
  • Efron, B. (2014). “Estimation and accuracy after model selection.” Journal of the American Statistical Association, 109(507): 991–1007.
  • Friedberg, R., Tibshirani, J., Athey, S., and Wager, S. (2018). “Local linear forests.” arXiv preprint arXiv:1807.11408.
  • Ganin, Y., Ustinova, E., Ajakan, H., Germain, P., Larochelle, H., Laviolette, F., Marchand, M., and Lempitsky, V. (2016). “Domain-adversarial training of neural networks.” The Journal of Machine Learning Research, 17(1): 2096–2030.
  • Gelman, A. et al. (2006). “Prior distributions for variance parameters in hierarchical models (comment on article by Browne and Draper).” Bayesian Analysis, 1(3): 515–534.
  • Giles, D. and Rayner, A. (1979). “The mean squared errors of the maximum likelihood and natural-conjugate Bayes regression estimators.” Journal of Econometrics, 11(2): 319–334.
  • Gramacy, R. B. and Lee, H. K. (2008). “Bayesian treed Gaussian process models with an application to computer modeling.” Journal of the American Statistical Association, 103(483).
  • Green, D. P. and Kern, H. L. (2012). “Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees.” Public opinion quarterly, nfs036.
  • Gustafson, P. and Greenland, S. (2006). “Curious phenomena in Bayesian adjustment for exposure misclassification.” Statistics in Medicine, 25(1): 87–103.
  • Hahn, P. R. and Carvalho, C. M. (2015). “Decoupling shrinkage and selection in Bayesian linear models: a posterior summary perspective.” Journal of the American Statistical Association, 110(509): 435–448.
  • Hahn, P. R., Dorie, V., and Murray, J. S. (2018). Atlantic Causal Inference Conference (ACIC) Data Analysis Challenge 2017.
  • Hahn, P. R., Puelz, D., He, J., and Carvalho, C. M. (2016). “Regularization and confounding in linear regression for treatment effect estimation.” Bayesian Analysis.
  • Hansen, B. B. (2008). “The prognostic analogue of the propensity score.” Biometrika, 95(2): 481–488.
  • He, J. (2019). “Stochastic tree ensembles for regularized supervised learning.” Technical report, University of Chicago Booth School of Business.
  • Heckman, J. J., Lopes, H. F., and Piatek, R. (2014). “Treatment effects: A Bayesian perspective.” Econometric reviews, 33(1-4): 36–67.
  • Hill, J., Su, Y.-S., et al. (2013). “Assessing lack of common support in causal inference using Bayesian nonparametrics: Implications for evaluating the effect of breastfeeding on children’s cognitive outcomes.” The Annals of Applied Statistics, 7(3): 1386–1420.
  • Hill, J. L. (2011). “Bayesian nonparametric modeling for causal inference.” Journal of Computational and Graphical Statistics, 20(1).
  • Imai, K., Ratkovic, M., et al. (2013). “Estimating treatment effect heterogeneity in randomized program evaluation.” The Annals of Applied Statistics, 7(1): 443–470.
  • Imai, K. and Van Dyk, D. A. (2004). “Causal inference with general treatment regimes: Generalizing the propensity score.” Journal of the American Statistical Association, 99(467): 854–866.
  • Imbens, G. W. and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press.
  • Johnson, E., Dominici, F., Griswold, M., and Zeger, S. L. (2003). “Disease cases and their medical costs attributable to smoking: an analysis of the national medical expenditure survey.” Journal of Econometrics, 112(1): 135–151.
  • Kern, H. L., Stuart, E. A., Hill, J., and Green, D. P. (2016). “Assessing methods for generalizing experimental impact estimates to target populations.” Journal of Research on Educational Effectiveness, 9(1): 103–127.
  • Künzel, S. R., Sekhon, J. S., Bickel, P. J., and Yu, B. (2019). “Metalearners for estimating heterogeneous treatment effects using machine learning.” Proceedings of the National Academy of Sciences, 116(10): 4156–4165.
  • Li, M. and Tobias, J. L. (2014). “Bayesian analysis of treatment effect models.” In Jeliazkov, I. and Yang, X.-S. (eds.), Bayesian Inference in the Social Sciences, chapter 3, 63–90. Wiley.
  • Linero, A. R. and Yang, Y. (2018). “Bayesian regression tree ensembles that adapt to smoothness and sparsity.” Journal of the Royal Statistical Society: Series B (Statistical Methodology), 80(5): 1087–1110.
  • McCaffrey, D. F., Griffin, B. A., Almirall, D., Slaughter, M. E., Ramchand, R., and Burgette, L. F. (2013). “A tutorial on propensity score estimation for multiple treatments using generalized boosted models.” Statistics in Medicine, 32(19): 3388–3414.
  • McCaffrey, D. F., Ridgeway, G., and Morral, A. R. (2004). “Propensity score estimation with boosted regression for evaluating causal effects in observational studies.” Psychological Methods, 9(4): 403.
  • McCandless, L. C., Gustafson, P., and Austin, P. C. (2009). “Bayesian propensity score analysis for observational data.” Statistics in Medicine, 28(1): 94–112.
  • McConnell, K. J. and Lindner, S. (2019). “Estimating treatment effects with machine learning.” Health services research.
  • Murray, J. S. (2017). “Log-Linear Bayesian Additive Regression Trees for Multinomial Logistic and Count Regression Models.” arXiv preprint arXiv:1701.01503.
  • Nie, X. and Wager, S. (2017). “Quasi-oracle estimation of heterogeneous treatment effects.” arXiv preprint arXiv:1712.04912.
  • Polson, N. G., Scott, J. G., et al. (2012). “On the half-Cauchy prior for a global scale parameter.” Bayesian Analysis, 7(4): 887–902.
  • Powers, S., Qian, J., Jung, K., Schuler, A., Shah, N. H., Hastie, T., and Tibshirani, R. (2018). “Some methods for heterogeneous treatment effect estimation in high dimensions.” Statistics in medicine, 37(11): 1767–1787.
  • Robins, J. M., Mark, S. D., and Newey, W. K. (1992). “Estimating exposure effects by modelling the expectation of exposure conditional on confounders.” Biometrics, 479–495.
  • Robins, J. M. and Ritov, Y. (1997). “Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models.” Statistics in medicine, 16(3): 285–319.
  • Rocková, V. and Saha, E. (2019). “On Theory for BART.” In The 22nd International Conference on Artificial Intelligence and Statistics, 2839–2848.
  • Rocková, V. and van der Pas, S. (2017). “Posterior concentration for Bayesian regression trees and forests.” Annals of Statistics (In Revision), 1–40.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). “The central role of the propensity score in observational studies for causal effects.” Biometrika, 41–55.
  • Roy, J., Lum, K. J., Zeldow, B., Dworkin, J. D., Re III, V. L., and Daniels, M. J. (2017). “Bayesian nonparametric generative models for causal inference with missing at random covariates.” Biometrics.
  • Shalit, U., Johansson, F. D., and Sontag, D. (2017). “Estimating individual treatment effect: generalization bounds and algorithms.” In Proceedings of the 34th International Conference on Machine Learning-Volume 70, 3076–3085.
  • Sivaganesan, S., Müller, P., and Huang, B. (2017). “Subgroup finding via Bayesian additive regression trees.” Statistics in Medicine.
  • Su, X., Kang, J., Fan, J., Levine, R. A., and Yan, X. (2012). “Facilitating score and causal inference trees for large observational studies.” Journal of Machine Learning Research, 13(Oct): 2955–2994.
  • Taddy, M., Gardner, M., Chen, L., and Draper, D. (2016). “A Nonparametric Bayesian Analysis of Heterogenous Treatment Effects in Digital Experimentation.” Journal of Business & Economic Statistics, 34(4): 661–672.
  • van der Laan, M. J. (2010a). “Targeted maximum likelihood based causal inference: Part I.” The International Journal of Biostatistics, 6(2).
  • van der Laan, M. J. (2010b). “Targeted maximum likelihood based causal inference: Part II.” The International Journal of Biostatistics, 6(2).
  • Wager, S. and Athey, S. (2018). “Estimation and inference of heterogeneous treatment effects using random forests.” Journal of the American Statistical Association, 113(523): 1228–1242.
  • Wager, S., Hastie, T., and Efron, B. (2014). “Confidence intervals for random forests: The jackknife and the infinitesimal jackknife.” The Journal of Machine Learning Research, 15(1): 1625–1651.
  • Wang, C., Parmigiani, G., and Dominici, F. (2012). “Bayesian effect estimation accounting for adjustment uncertainty.” Biometrics, 68(3): 661–671.
  • Wendling, T., Jung, K., Callahan, A., Schuler, A., Shah, N., and Gallego, B. (2018). “Comparing methods for estimation of heterogeneous treatment effects using observational data from health care databases.” Statistics in medicine, 37(23): 3309–3324.
  • Yang, Y., Cheng, G., and Dunson, D. B. (2015). “Semiparametric Bernstein-von Mises Theorem: Second Order Studies.” arXiv preprint arXiv:1503.04493.
  • Yeager, D. S., Hanselman, P., Walton, G. M., Murray, J. S., Crosnoe, R., Muller, C., Tipton, E., Schneider, B., Hulleman, C. S., Hinojosa, C. P., Paunesku, D., Romero, C., Flint, K., Roberts, A., Trott, J., Iachan, R., Buontempo, J., Yang, S. M., Carvalho, C. M., Hahn, P. R., Gopalan, M., Mhatre, P., Ferguson, R., Duckworth, A. L., and Dweck, C. S. (2019). “A national experiment reveals where a growth mindset improves achievement.” Nature, 573(7774): 364–369. URL
  • Zaidi, A. and Mukherjee, S. (2018). “Gaussian Process Mixtures for Estimating Heterogeneous Treatment Effects.” arXiv preprint arXiv:1812.07153.
  • Zeger, S. L., Wyant, T., Miller, L. S., and Samet, J. (2000). “Statistical testimony on damages in Minnesota v. Tobacco Industry.” In Statistical Science in the Courtroom, 303–320. Springer.
  • Zellner, A. (1986). “On assessing prior distributions and Bayesian regression analysis with g-prior distributions.” Bayesian inference and decision techniques: Essays in Honor of Bruno De Finetti, 6: 233–243.
  • Zigler, C. M. and Dominici, F. (2014). “Uncertainty in propensity score estimation: Bayesian methods for variable selection and model-averaged causal effects.” Journal of the American Statistical Association, 109(505): 95–107.