## The Annals of Applied Statistics

- Volume 13, Number 2 (2019), 1242-1267.

### Estimating population average causal effects in the presence of non-overlap: The effect of natural gas compressor station exposure on cancer mortality

Rachel C. Nethery, Fabrizia Mealli, and Francesca Dominici

#### Abstract

Most causal inference studies rely on the assumption of overlap to estimate population or sample average causal effects. When data suffer from non-overlap, estimation of these estimands requires reliance on model specifications due to poor data support. All existing methods to address non-overlap, such as trimming or down-weighting data in regions of poor data support, change the estimand so that inference cannot be made on the sample or the underlying population. In environmental health research settings where study results are often intended to influence policy, population-level inference may be critical and changes in the estimand can diminish the impact of the study results, because estimates may not be representative of effects in the population of interest to policymakers. Researchers may be willing to make additional, minimal modeling assumptions in order to preserve the ability to estimate population average causal effects. We seek to make two contributions on this topic. First, we propose a flexible, data-driven definition of propensity score overlap and non-overlap regions. Second, we develop a novel Bayesian framework to estimate population average causal effects with minor model dependence and appropriately large uncertainties in the presence of non-overlap and causal effect heterogeneity. In this approach the tasks of estimating causal effects in the overlap and non-overlap regions are delegated to two distinct models suited to the degree of data support in each region. Tree ensembles are used to nonparametrically estimate individual causal effects in the overlap region, where the data can speak for themselves. In the non-overlap region where insufficient data support means reliance on model specification is necessary, individual causal effects are estimated by extrapolating trends from the overlap region via a spline model. The promising performance of our method is demonstrated in simulations. Finally, we utilize our method to perform a novel investigation of the causal effect of natural gas compressor station exposure on cancer outcomes. Code and data to implement the method and reproduce all simulations and analyses is available on Github (https://github.com/rachelnethery/overlap).

Received: May 2018

Revised: November 2018

First available in Project Euclid: 17 June 2019

https://projecteuclid.org/euclid.aoas/1560758445

doi:10.1214/18-AOAS1231

MR3963570

07094853

Overlap propensity score Bayesian additive regression trees splines natural gas cancer mortality

Nethery, Rachel C.; Mealli, Fabrizia; Dominici, Francesca. Estimating population average causal effects in the presence of non-overlap: The effect of natural gas compressor station exposure on cancer mortality. Ann. Appl. Stat. 13 (2019), no. 2, 1242--1267. doi:10.1214/18-AOAS1231. https://projecteuclid.org/euclid.aoas/1560758445

- Sampling details, additional simulations, and supplementary tables and figures. Section 1 of the Supplementary Materials contains a step-by-step description of the BART${}+{}$SPL MCMC sampling scheme. Section 2 describes the data and results from simulations to test the performance of BART${}+{}$SPL for binary outcomes. Section 3 provides the supplementary tables and figures referenced in the text.Digital Object Identifier: doi:10.1214/18-AOAS1231SUPPSupplemental files are immediately available to subscribers. Non-subscribers gain access to supplemental files with the purchase of the article.