## Statistical Science

### Causal Inference: A Missing Data Perspective

#### Abstract

Inferring causal effects of treatments is a central goal in many disciplines. The potential outcomes framework is a main statistical approach to causal inference, in which a causal effect is defined as a comparison of the potential outcomes of the same units under different treatment conditions. Because for each unit at most one of the potential outcomes is observed and the rest are missing, causal inference is inherently a missing data problem. Indeed, there is a close analogy in the terminology and the inferential framework between causal inference and missing data. Despite the intrinsic connection between the two subjects, statistical analyses of causal inference and missing data also have marked differences in aims, settings and methods. This article provides a systematic review of causal inference from the missing data perspective. Focusing on ignorable treatment assignment mechanisms, we discuss a wide range of causal inference methods that have analogues in missing data analysis, such as imputation, inverse probability weighting and doubly robust methods. Under each of the three modes of inference—Frequentist, Bayesian and Fisherian randomization—we present the general structure of inference for both finite-sample and super-population estimands, and illustrate via specific examples. We identify open questions to motivate more research to bridge the two fields.

#### Article information

Source
Statist. Sci., Volume 33, Number 2 (2018), 214-237.

Dates
First available in Project Euclid: 3 May 2018

Permanent link to this document
https://projecteuclid.org/euclid.ss/1525313143

Digital Object Identifier
doi:10.1214/18-STS645

Mathematical Reviews number (MathSciNet)
MR3797711

Zentralblatt MATH identifier
1397.62125

#### Citation

Ding, Peng; Li, Fan. Causal Inference: A Missing Data Perspective. Statist. Sci. 33 (2018), no. 2, 214--237. doi:10.1214/18-STS645. https://projecteuclid.org/euclid.ss/1525313143

#### References

• Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica 74 235–267.
• Abadie, A. and Imbens, G. (2011). Bias corrected matching estimators for average treatment effects. J. Bus. Econom. Statist. 29 1–11.
• Andrews, D. W. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica 68 399–405.
• Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444–455.
• Angrist, J. D. and Pischke, J.-S. (2008). Mostly Harmless Econometrics: An Empiricist’s Companion. Princeton Univ. Press, Princeton, NJ.
• Athey, S. and Imbens, G. W. (2015). Machine learning methods for estimating heterogeneous causal effects. Available at arXiv:1504.01132.
• Athey, S., Imbens, G. W. and Wager, S. (2018). Approximate residual balancing: De-biased inference of average treatment effects in high dimensions. J. R. Stat. Soc. Ser. B. Stat. Methodol. To appear. Available at https://arxiv.org/abs/1604.07125.
• Athey, S., Imbens, G., Pham, T. and Wager, S. (2017). Estimating average treatment effects: Supplementary analyses and remaining challenges. Am. Econ. Rev. 107 278–281.
• Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 962–972.
• Belloni, A., Chernozhukov, V. and Hansen, C. (2014). Inference on treatment effects after selection among high-dimensional controls. Rev. Econ. Stud. 81 608–650.
• Belloni, A., Chernozhukov, V., Fernández-Val, I. and Hansen, C. (2017). Program evaluation and causal inference with high-dimensional data. Econometrica 85 233–298.
• Bickel, P. J. and Doksum, K. A. (2015). Mathematical Statistics: Basic Ideas and Selected Topics, Volume I, 2nd ed. CRC Press, Boca Raton, FL.
• Bloniarz, A., Liu, H., Zhang, C.-H., Sekhon, J. S. and Yu, B. (2016). Lasso adjustments of treatment effect estimates in randomized experiments. Proc. Natl. Acad. Sci. USA 113 7383–7390.
• Carpenter, B., Gelman, A., Hoffman, M. D., Lee, D., Goodrich, B., Betancourt, M., Brubaker, M., Guo, J., Li, P. and Riddell, A. (2017). Stan: A probabilistic programming language. J. Stat. Softw. 76 1–32.
• Chapin, F. S. (1947). Experimental Designs in Sociological Research. Harper, New York.
• Chen, H., Geng, Z. and Zhou, X.-H. (2009). Identifiability and estimation of causal effects in randomized trials with noncompliance and completely nonignorable missing data. Biometrics 65 675–682.
• Cheng, J. and Small, D. S. (2006). Bounds on causal effects in three-arm trials with non-compliance. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 815–836.
• Chernozhukov, V., Chetverikov, D., Demirer, M., Duflo, E., Hansen, C., Newey, W. and Robins, J. (2016). Double/debiased machine learning for treatment and causal parameters. Preprint. Available at arXiv:1608.00060.
• Chib, S. and Jacobi, L. (2016). Bayesian fuzzy regression discontinuity analysis and returns to compulsory schooling. J. Appl. Econometrics 31 1026–1047.
• Chung, E. and Romano, J. P. (2013). Exact and asymptotically robust permutation tests. Ann. Statist. 41 484–507.
• Cochran, W. G. (1953). Sampling Techniques, 1st ed. Wiley, New York.
• Cochran, W. G. (1957). Analysis of covariance: Its nature and uses. Biometrics 13 261–281.
• Cochran, W. G. (2007). Sampling Techniques, 3rd ed. Wiley, New York.
• Cornfield, J., Haenszel, W., Hammond, E. et al. (1959). Smoking and lung cancer: Recent evidence and a discussion of some questions. J. Natl. Cancer Inst. 22 173–203.
• Crump, R. K., Hotz, V. J., Imbens, G. W. and Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika 96 187–199.
• Dawid, A. P. (2000). Causal inference without counterfactuals. J. Amer. Statist. Assoc. 95 407–424.
• Dawid, A. P. Musio, M. and Murtas, R. (2017). The probability of causation. Law, Probability and Risk 16 163–179.
• Dempster, A., Laird, N. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B. Stat. Methodol. 39 1–38.
• Ding, P. (2014). Three occurrences of the hyperbolic-secant distribution. Amer. Statist. 68 32–35.
• Ding, P. and Dasgupta, T. (2016). A potential tale of two-by-two tables from completely randomized experiments. J. Amer. Statist. Assoc. 111 157–168.
• Ding, P., Feller, A. and Miratrix, L. (2016). Randomization inference for treatment effect variation. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 655–671.
• Ding, P. and Geng, Z. (2014). Identifiability of subgroup causal effects in randomized experiments with nonignorable missing covariates. Stat. Med. 33 1121–1133.
• Ding, P. and Lu, J. (2017). Principal stratification analysis using principal scores. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 757–777.
• Ding, P. and VanderWeele, T. J. (2016). Sensitivity analysis without assumptions. Epidemiology 27 368–377.
• Ding, P., Geng, Z., Yan, W. and Zhou, X.-H. (2011). Identifiability and estimation of causal effects by principal stratification with outcomes truncated by death. J. Amer. Statist. Assoc. 106 1578–1591.
• Ding, W. and Song, P. X.-K. (2016). EM algorithm in Gaussian copula with missing data. Comput. Statist. Data Anal. 101 1–11.
• Elliott, M., Raghunathan, T. and Li, Y. (2010). Bayesian inference for causal mediation effects using principal stratification with dichotomous mediators and outcomes. Biostatistics 11 353–372.
• Fan, Y., Guerre, E. and Zhu, D. (2017). Partial identification of functionals of the joint distribution of “potential outcomes”. J. Econometrics 197 42–59.
• Fan, Y. and Park, S. S. (2010). Sharp bounds on the distribution of treatment effects and their statistical inference. Econometric Theory 26 931–951.
• Feller, A., Greif, E., Miratrix, L. and Pillai, N. (2016). Principal stratification in the twilight zone: Weakly separated components in finite mixture models. Preprint. Available at arXiv:1602.06595.
• Firth, D. and Bennett, K. E. (1998). Robust models in probability sampling. J. R. Stat. Soc. Ser. B. Stat. Methodol. 60 3–21.
• Fisher, R. A. (1935). The Design of Experiments, 1st ed. Oliver and Boyd, Edinburgh.
• Frangakis, C. and Rubin, D. B. (1999). Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika 86 365–378.
• Frangakis, C. and Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics 58 21–29.
• Frumento, P., Mealli, F., Pacini, B. and Rubin, D. B. (2012). Evaluating the effect of training on wages in the presence of noncompliance, nonemployment, and missing outcome data. J. Amer. Statist. Assoc. 107 450–466.
• Frumento, P., Mealli, F., Pacini, B. and Rubin, D. B. (2016). The fragility of standard inferential approaches in principal stratification models relative to direct likelihood approaches. Stat. Anal. Data Min. 9 58–70.
• Gallop, R., Small, D., Lin, J., Elliot, M., Joffe, M. and Have, T. T. (2009). Mediation analysis with principal stratification. Stat. Med. 28 1108–1130.
• Gelfand, A. and Smith, A. (1990). Sampling-based approaches to calculating marginal densities. J. Amer. Statist. Assoc. 85 398–409.
• Gelman, A., Meng, X.-L. and Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statist. Sinica 6 733–807.
• Gelman, A., Carlin, J. B., Stern, H. S., Dunson, D. B., Vehtari, A. and Rubin, D. B. (2014). Bayesian Data Analysis, 3nd ed. Chapman & Hall/CRC, Boca Raton, FL.
• Gilbert, P. and Hudgens, M. (2008). Evaluating candidate principal surrogate endpoints. Biometrics 64 1146–1154.
• Graham, B. S., de Xavier Pinto, C. C. and Egel, D. (2012). Inverse probability tilting for moment condition models with missing data. Rev. Econ. Stud. 79 1053–1079.
• Grilli, L. and Mealli, F. (2008). Nonparametric bounds on the causal effect of university studies on job opportunities using principal stratification. J. Educ. Behav. Stat. 33 111–130.
• Gustafson, P. (2009). What are the limits of posterior distributions arising from nonidentified models, and why should we care? J. Amer. Statist. Assoc. 104 1682–1695.
• Gustafson, P. (2015). Bayesian Inference for Partially Identified Models: Exploring the Limits of Limited Data. CRC Press, Boca Raton, FL.
• Hahn, J. (1998). On the role of the propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66 315–331.
• Hainmueller, J. (2012). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal. 20 25–46.
• Hájek, J. (1971). Comment on a paper by D. Basu. In Foundations of Statistical Inference (V. P. Godambe and D. A. Sprott, eds.) 236. Holt, Rinehart and Winston, Toronto.
• Heckman, J. J. (1979). Sample selection bias as a specification error. Econometrica 47 153–161.
• Heckman, J., Lopes, H. and Piatek, R. (2014). Treatment effects: A Bayesian perspective. Econometric Rev. 33 36–67.
• Hirano, K. and Imbens, G. W. (2004). The propensity score with continuous treatments. In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives. 73–84. Wiley, Chichester.
• Hirano, K., Imbens, G. W. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71 1161–1189.
• Ho, D. E., Imai, K., King, G. and Stuart, E. A. (2011). MatchIt: Nonparametric preprocessing for parametric causal inference. J. Stat. Softw. 42 1–28.
• Hoeffding, W. (1952). The large-sample power of tests based on permutations of observations. Ann. Math. Stat. 23 169–192.
• Holland, P. (1986). Statistics and causal inference (with discussion). J. Amer. Statist. Assoc. 81 945–970.
• Horvitz, D. and Thompson, D. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
• Huber, P. J. (1967). The behavior of maximum likelihood estimates under nonstandard conditions. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability 1 221–233. Univ. California Press, Berkeley, CA.
• Ichino, A., Mealli, F. and Nannicini, T. (2008). From temporary help jobs to permanent employment: What can we learn from matching estimators and their sensitivity? J. Appl. Econometrics 23 305–327.
• Imai, K. (2008). Sharp bounds on the causal effects in randomized experiments with “truncation-by-death”. Statist. Probab. Lett. 78 144–149.
• Imai, K. and Ratkovic, M. (2014). Covariate balancing propensity score. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 243–263.
• Imai, K. and van Dyk, D. (2004). Causal treatment with general treatment regimes: Generalizing the propensity score. J. Amer. Statist. Assoc. 99 854–866.
• Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika 87 706–710.
• Imbens, G. W. (2003). Sensitivity to exogeneity assumptions in program evaluation. Am. Econ. Rev. 93 126–132.
• Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat. 86 4–29.
• Imbens, G. W. and Angrist, J. (1994). Identification and estimation of local average treatment effects. Econometrica 62 467–476.
• Imbens, G. W. and Rubin, D. B. (1997). Bayesian inference for causal effects in randomized experiments with noncompliance. Ann. Statist. 25 305–327.
• Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge Univ. Press, New York.
• Kang, J. D. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22 523–539.
• Li, X. and Ding, P. (2017). General forms of finite population central limit theorems with applications to causal inference. J. Amer. Statist. Assoc. 112 1759–1769.
• Li, F., Mattei, A. and Mealli, F. (2015). Evaluating the causal effect of university grants on student dropout: Evidence from a regression discontinuity design using principal stratification. Ann. Appl. Stat. 9 1906–1931.
• Li, F., Morgan, K. and Zaslavsky, A. (2018). Balancing covariates via propensity score weighting. J. Amer. Statist. Assoc. To appear. Available at https://doi.org/10.1080/01621459.2016.1260466.
• Li, F., Baccini, M., Mealli, F., Zell, E. R., Frangakis, C. E. and Rubin, D. B. (2014). Multiple imputation by ordered monotone blocks with application to the anthrax vaccine research program. J. Comput. Graph. Statist. 23 877–892.
• Lin, W. (2013). Agnostic notes on regression adjustments to experimental data: Reexamining freedman’s critique. Ann. Appl. Stat. 7 295–318.
• Lindley, D. V. (1972). Bayesian Statistics: A Review. SIAM, Philadelphia, PA.
• Little, R. J. (1988). Missing-data adjustments in large surveys. J. Bus. Econom. Statist. 6 287–296.
• Little, R. and An, H. (2004). Robust likelihood-based analysis of multivariate data with missing values. Statist. Sinica 14 949–968.
• Little, R. J. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley-Interscience, Hoboken, NJ.
• Liublinska, V. and Rubin, D. B. (2014). Sensitivity analysis for a partially missing binary outcome in a two-arm randomized clinical trial. Stat. Med. 33 4170–4185.
• Lu, J., Ding, P. and Dasgupta, T. (2015). Treatment effects on ordinal outcomes: Causal estimands and sharp bounds. Preprint. Available at arXiv:1507.01542.
• Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Stat. Med. 23 2937–2960.
• Manski, C. F. (1990). Nonparametric bounds on treatment effects. Am. Econ. Rev. 80 319–323.
• Mattei, A. and Mealli, F. (2011). Augmented designs to assess principal strata direct effects. J. R. Stat. Soc. Ser. B. Stat. Methodol. 73 729–752.
• Mattei, A., Mealli, F. and Pacini, B. (2014). Identification of causal effects in the presence of nonignorable missing outcome values. Biometrics 70 278–288.
• Mealli, F. and Rubin, D. B. (2015). Clarifying missing at random and related definitions, and implications when coupled with exchangeability. Biometrika 102 995–1000.
• Mealli, F., Imbens, G. W., Ferro, S. and Biggeri, A. (2004). Analyzing a randomized trial on breast self-examination with noncompliance and missing outcomes. Biostatistics 5 207–222.
• Mebane, W. R. Jr and Poast, P. (2013). Causal inference without ignorability: Identification with nonrandom assignment and missing treatment data. Polit. Anal. 21 233–251.
• Meng, X.-L. (1994). Posterior predictive $p$-values. Ann. Statist. 22 1142–1160.
• Mercatanti, A. (2004). Analyzing a randomized experiment with imperfect compliance and ignorable conditions for missing data: Theoretical and computational issues. Comput. Statist. Data Anal. 46 493–509.
• Mercatanti, A. and Li, F. (2014). Do debit cards increase household spending? Evidence from a semiparametric causal analysis of a survey. Ann. Appl. Stat. 8 2405–2508.
• Mercatanti, A. and Li, F. (2017). Do debit cards decrease cash demand? Causal inference and sensitivity analysis using principal stratification. J. R. Stat. Soc. Ser. C. Appl. Stat. 66 759–776.
• Miratrix, L. W., Sekhon, J. S. and Yu, B. (2013). Adjusting treatment effect estimates by post-stratification in randomized experiments. J. R. Stat. Soc. Ser. B. Stat. Methodol. 75 369–396.
• Mitra, R. and Reiter, J. P. (2011). Estimating propensity scores with missing covariate data using general location mixture models. Stat. Med. 30 627–641.
• Mitra, R. and Reiter, J. P. (2016). A comparison of two methods of estimating propensity scores after multiple imputation. Stat. Methods Med. Res. 25 188–204.
• Molinari, F. (2010). Missing treatments. J. Bus. Econom. Statist. 28 82–95.
• Murray, J. S. and Reiter, J. P. (2016). Multiple imputation of missing categorical and continuous values via Bayesian mixture models with local dependence. J. Amer. Statist. Assoc. 111 1466–1479.
• Newey, W. K. (1997). Convergence rates and asymptotic normality for series estimators. J. Econometrics 79 147–168.
• Neyman, J. (1935). Statistical problems in agricultural experimentation. Suppl. J. R. Stat. Soc. 2 107–180.
• Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 465–472.
• Nolen, T. L. and Hudgens, M. G. (2011). Randomization-based inference within principal strata. J. Amer. Statist. Assoc. 106 581–593.
• Qin, J. (2017). Biased Sampling, Over-Identified Parameter Problems and Beyond. Springer, Singapore.
• Richardson, T. S., Evans, R. J. and Robins, J. M. (2010). Transparent parameterizations of models for potential outcomes. In Bayesian Statistics 9 569–610. Oxford Univ. Press, Oxford.
• Ridgeway, G., McCaffrey, D., Morral, A., Griffin, B. A. and Burgette, L. (2017). twang: Toolkit for Weighting and Analysis of Nonequivalent Groups. R package version 1.5.
• Robins, J. M. (1986). A new approach to causal inference in mortality studies with sustained exposure periods—Application to control of the healthy worker survivor effect. Math. Modelling 7 1393–1512.
• Robins, J. M. and Ritov, Y. (1997). Toward a curse of dimensionality appropriate (CODA) asymptotic theory for semi-parametric models. Stat. Med. 16 285–319.
• Robins, J. M., Rotnitzky, A. and Scharfstein, D. O. (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical Models in Epidemiology, the Environment, and Clinical Trials 1–94. Springer, New York.
• Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Amer. Statist. Assoc. 90 106–121.
• Robins, J. M., van der Vaart, A. and Ventura, V. (2000). Asymptotic distribution of $p$ values in composite null models. J. Amer. Statist. Assoc. 95 1143–1156.
• Rosenbaum, P. R. (1984a). Conditional permutation tests and the propensity score in observational studies. J. Amer. Statist. Assoc. 79 565–574.
• Rosenbaum, P. R. (1984b). The consquences of adjustment for a concomitant variable that has been affected by the treatment. J. R. Stat. Soc. Ser. B. Stat. Methodol. 147 656–666.
• Rosenbaum, P. R. (1987). Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika 74 13–26.
• Rosenbaum, P. R. (2002a). Covariance adjustment in randomized experiments and observational studies. Statist. Sci. 17 286–327.
• Rosenbaum, P. R. (2002b). Observational Studies, 2nd ed. Springer, New York.
• Rosenbaum, P. R. (2010). Design of Observational Studies. Springer, New York.
• Rosenbaum, P. R. and Rubin, D. B. (1983a). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. R. Stat. Soc. Ser. B. Stat. Methodol. 45 212–218.
• Rosenbaum, P. R. and Rubin, D. B. (1983b). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
• Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Statist. Assoc. 79 516–524.
• Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66 688–701.
• Rubin, D. B. (1975). Bayesian inference for causality: The role of randomization. In Proceedings of the Social Statistics Section of the American Statistical Association 233–239.
• Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592.
• Rubin, D. B. (1977). Assignment to a treatment group on the basis of a covariate. Journal of Educational Statistics 2 1–26.
• Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34–58.
• Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Amer. Statist. Assoc. 74 318–324.
• Rubin, D. B. (1980). Comment on “Randomization analysis of experimental data: The Fisher randomization test” by D. Basu. J. Amer. Statist. Assoc. 75 591–593.
• Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applies statistician. Ann. Statist. 12 1151–1172.
• Rubin, D. B. (1986). Statistical matching using file concatenation with adjusted weights and multiple imputations. J. Bus. Econom. Statist. 4 87–94.
• Rubin, D. B. (1998). More powerful randomization-based $p$-values in double-blind trials with non-compliance. Stat. Med. 17 371–385.
• Rubin, D. B. (2005). Causal inference using potential outcomes: Design, modeling, decisions. J. Amer. Statist. Assoc. 100 322–331.
• Rubin, D. B. (2006a). Causal inference through potential outcomes and principal stratification: Application to studies with “censoring” due to death. Statist. Sci. 91 299–321.
• Rubin, D. B. (2006b). Matched Sampling for Causal Effects. Cambridge Univ. Press, Cambridge.
• Rubin, D. B. (2007). The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Stat. Med. 26 20–36.
• Rubin, D. B. (2008). For objective causal inference, design trumps analysis. Ann. Appl. Stat. 2 808–840.
• Scharfstein, D., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models (with discussion). J. Amer. Statist. Assoc. 94 1096–1146.
• Schwartz, S., Li, F. and Reiter, J. P. (2012). Sensitivity analysis for unmeasured confounding in principal stratification settings with binary variables. Stat. Med. 31 949–962.
• Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013). What is meant by “missing at random”? Statist. Sci. 28 257–268.
• Sekhon, J. S. (2011). Multivariate and propensity score matching software with automated balance optimization: The matching package for R. J. Stat. Softw. 42 1–52.
• Stuart, E. (2010). Matching methods for causal inference: A review and a look forward. Statist. Sci. 25 1–21.
• Tanner, M. and Wong, W. (1987). The calculation of posterior distributions by data augmentation. J. Amer. Statist. Assoc. 82 528–540.
• Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. R. Stat. Soc. Ser. B. Stat. Methodol. 58 267–288.
• Tsiatis, A. A., Davidian, M., Zhang, M. and Lu, X. (2008). Covariate adjustment for two-sample treatment comparisons in randomized clinical trials: A principled yet flexible approach. Stat. Med. 27 4658–4677.
• Tukey, J. W. (1993). Tightening the clinical trial. Controlled Clinical Trials 14 266–285.
• van Buuren, S. (2012). Flexible Imputation of Missing Data. CRC press, Boca Raton, FL.
• van der Laan, M. J. and Rose, S. (2011). Targeted Learning: Causal Inference for Observational and Experimental Data. Springer, New York.
• VanderWeele, T. (2008). Simple relations between principal stratification and direct and indirect effects. Statist. Probab. Lett. 78 2957–2962.
• Wager, S., Du, W., Taylor, J. and Tibshirani, R. J. (2016). High-dimensional regression adjustments in randomized experiments. Proc. Natl. Acad. Sci. USA 113 12673–12678.
• White, H. (1980). A heteroskedasticity-consistent covariance matrix estimator and a direct test for heteroskedasticity. Econometrica 48 817–838.
• Yang, S. and Ding, P. (2018). Asymptotic causal inference with observational studies trimmed by the estimated propensity scores. Biometrika. To appear. Available at https://arxiv.org/abs/1604.07125.
• Yang, F. and Small, D. S. (2016). Using post-outcome measurement information in censoring-by-death problems. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 299–318.
• Yang, S., Wang, L. and Ding, P. (2017). Nonparametric identification of causal effects with confounders subject to instrumental missingness. Preprint. Available at arXiv:1702.03951.
• Zhang, G. and Little, R. J. (2009). Extensions of the penalized spline of propensity prediction method of imputation. Biometrics 65 911–918.
• Zhang, J. and Rubin, D. B. (2003). Estimation of causal effects via principal stratification when some outcomes are truncated by “death”. J. Educ. Behav. Stat. 28 353–358.
• Zhang, J., Rubin, D. B. and Mealli, F. (2009). Likelihood-based analysis of the causal effects of job-training programs using principal stratification. J. Amer. Statist. Assoc. 104 166–176.
• Zhang, Z., Liu, W., Zhang, B., Tang, L. and Zhang, J. (2016). Causal inference with missing exposure information: Methods and applications to an obstetric study. Stat. Methods Med. Res. 25 2053–2066.
• Zhou, J., Zhang, Z., Li, Z. and Zhang, J. (2015). Coarsened propensity scores and hybrid estimators for missing data and causal inference. Int. Stat. Rev. 83 449–471.
• Zigler, C. and Belin, T. (2012). A Bayesian approach to improved estimation of causal effect predictiveness for a principal surrogate endpoint. Biometrics 68 922–932.
• Zubizarreta, J. R. (2015). Stable weights that balance covariates for estimation with incomplete outcome data. J. Amer. Statist. Assoc. 110 910–922.