Abstract
A standard assumption for causal inference from observational data is that one has measured a sufficiently rich set of covariates to ensure that within covariate strata, subjects are exchangeable across observed treatment values. Skepticism about the exchangeability assumption in observational studies is often warranted because it hinges on investigators’ ability to accurately measure covariates capturing all potential sources of confounding. Realistically, confounding mechanisms can rarely if ever, be learned with certainty from measured covariates. One can therefore only ever hope that covariate measurements are at best proxies of true underlying confounding mechanisms operating in an observational study, thus invalidating causal claims made on basis of standard exchangeability conditions. Causal inference from proxies is a challenging inverse problem which has to date remained unresolved. In this paper, we introduce a formal potential outcome framework for proximal causal inference, which while explicitly acknowledging covariate measurements as imperfect proxies of confounding mechanisms, offers an opportunity to learn about causal effects in settings where exchangeability on the basis of measured covariates fails. The proposed framework is closely related to the emerging literature on the use of proxies or negative control variables for nonparametric identification of causal effects in presence of hidden confounding bias (Biometrika 105 (2018) 987–993). However, while prior literature largely focused on point treatment settings, here we consider the more challenging setting of a complex longitudinal study with time-varying treatments and both measured and unmeasured time-varying confounding. Upon reviewing existing results for proximal identification in the point treatment setting, we provide new identification results for the time-varying setting, leading to the proximal g-formula and corresponding proximal g-computation algorithm for estimation. These may be viewed as generalizations of Robins’ foundational g-formula and g-computation algorithm, which account explicitly for bias due to unmeasured confounding. Applications of proximal g-computation of causal effects are given for illustration in both point treatment and time-varying treatment settings.
Acknowledgments
The authors would like to thank the anonymous referees, an Associate Editor and the Editor for their constructive comments that improved the quality of this paper.
The authors thank Dr. Stephen R. Cole for helpful comments.
Citation
Eric J. Tchetgen Tchetgen. Andrew Ying. Yifan Cui. Xu Shi. Wang Miao. "An Introduction to Proximal Causal Inference." Statist. Sci. 39 (3) 375 - 390, August 2024. https://doi.org/10.1214/23-STS911
Information