Discussion on"Bayesian Regression Tree Models for Causal Inference: Regularization, Confounding, and Heterogeneous Effects"by Hahn, Murray and Carvalho

Hahn et al. (2020) offers an extensive study to explicate and evaluate the performance of the BCF model in different settings and provides a detailed discussion about its utility in causal inference. It is a welcomed addition to the causal machine learning literature. I will emphasize the contribution of the BCF model to the field of causal inference through discussions on two topics: 1) the difference between the PS in the BCF model and the Bayesian PS in a Bayesian updating approach, 2) an alternative exposition of the role of the PS in outcome modeling based methods for the estimation of causal effects. I will conclude with comments on avenues for future research involving BCF that will be important and much needed in the era of Big data.

I will emphasize the contribution of the BCF model to the field of causal inference through discussions on two topics: 1) the difference between the PS in the BCF model and the Bayesian PS in a Bayesian updating approach, 2) an alternative exposition of the role of the PS in outcome modeling based methods for the estimation of causal effects. I will conclude with comments on avenues for future research involving BCF that will be important and much needed in the era of Big data.

Distinction from the Bayesian propensity scores
It is necessary to make a distinction between incorporating the estimated PSs as a covariate in the BCF model and combining so called "Bayesian propensity scores" and Bayesian inference for the estimation of causal effects in a single Bayesian updating approach (Zigler et al., 2013;Zigler and Dominici, 2014). The Bayesian PS has received recent attention in the literature (Kaplan and Chen, 2012;Zigler, 2016;Liao and Zigler, 2020). In essence, a series of work have demonstrated that the model feedback, i.e., the propagation of information from the outcome model to the PS model, would distort inferences about the causal effect. In the BCF model, the independent BART prior is placed over f , f ∼ BART(X, Z,π), whereπ is the estimated PS and included as one of the splitting dimensions. Theπ is included in the BCF model as an additional covariate for splitting and is not updated or contaminated by the outcome information, thereby setting BCF free from the inference issue caused by the Bayesian model feedback.
2 The role of the propensity score: connections to the confounding function I now turn to exposing the role of the PS in reducing the bias in the estimates of causal effects under targeted selection. The inclusion of the PS as a covariate in the response model is closely connected to the confounding function approach (Robins, 1999), designed for removing the bias due to unmeasured confounding from the treatment effect estimates. The targeted selection described in Hahn et al. (2020) suggests that the treatment assignment probability π(x) depends on µ(x) = E(Y |Z = 0, x).
Using Figure 4 as an example to illustrate, individuals who would have higher outcomes without treatment are more likely to be treated. In another word, treated individuals would have higher (potential) outcomes than untreated individuals to no treatment. This is a violation of the ignorability assumption. To see why, define a confounding function as c(z, When ignorability holds, c(z, x) = 0 for z = 0 and z = 1. When ignorability is violated, such as in the presence of targeted selection, c(0, x) > 0. The violation of ignorability gives rise to biased estimates of the causal effects. An unbiased effect estimate can be obtained by "correcting" the observed outcome for unmeasured confounding, namely, (Robins, 1999;Brumback et al., 2004). Applying the law of total expectation to E(Y (z)|x) yields a "corrected" outcome, Differencing the conditional expectations of Y C between the treated and untreated individuals gives an unbiased effect estimate, is a user-supplied prior distribution (a range of values in frequentist approaches) representing our beliefs about the degree of ignorability violation, or, targeted selection (Hogan et al., 2014). We see that the PS, π(x), is an integral part in the outcome for causal modeling when there is a need to remove the bias attributable to targeted selection. Relating to the inclusion ofπ in the BCF model, the c(z, x) can be deemed as characterizing how big of a role the PS plays in the estimation of causal effects. With strong targeted selection, c(z, x) would deviate from zero substantially, thenπ is important for bias reduction. In the absence of targeted selection, c(z, x) would be close to zero, and the role ofπ is diminished.

Final thought on possible extensions
My final thought is about extending the BCF model to meet the emerging methodological needs, particularly in the Biostatistical research. First, a common causal estimand of interest is the average treatment effect on the treated (ATT). The BCF model does not seem to be readily implementable for the ATT estimation. Although, the idea of ps-BART can be easily applied to estimate the ATT effect. Second, in the era of Big data, given the wealth of information captured in large-scale data, it is rare that treatment regimens are defined in terms of two treatments only. Refined causal inference approaches are in great demand for the multiple treatment settings. Hu et al. (2020) investigated the operating characteristics of several machine learning based causal inference techniques in the multiple treatment setting, and found that BART-based methods generally had the best performance. It would be a useful addition to the causal machine learning literature if the BCF model can be extended to simultaneously compare more than two active treatments.