The Annals of Applied Statistics

Estimation of causal effects using instrumental variables with nonignorable missing covariates: Application to effect of type of delivery NICU on premature infants

Fan Yang, Scott A. Lorch, and Dylan S. Small

Full-text: Open access


Understanding how effective high-level NICUs (neonatal intensive care units that have the capacity for sustained mechanical assisted ventilation and high volume) are compared to low-level NICUs is important and valuable for both individual mothers and for public policy decisions. The goal of this paper is to estimate the effect on mortality of premature babies being delivered in a high-level NICU vs. a low-level NICU through an observational study where there are unmeasured confounders as well as nonignorable missing covariates. We consider the use of excess travel time as an instrumental variable (IV) to control for unmeasured confounders. In order for an IV to be valid, we must condition on confounders of the IV—outcome relationship, for example, month prenatal care started must be conditioned on for excess travel time to be a valid IV. However, sometimes month prenatal care started is missing, and the missingness may be nonignorable because it is related to the not fully measured mother’s/infant’s risk of complications. We develop a method to estimate the causal effect of a treatment using an IV when there are nonignorable missing covariates as in our data, where we allow the missingness to depend on the fully observed outcome as well as the partially observed compliance class, which is a proxy for the unmeasured risk of complications. A simulation study shows that under our nonignorable missingness assumption, the commonly used estimation methods, complete-case analysis and multiple imputation by chained equations assuming missingness at random, provide biased estimates, while our method provides approximately unbiased estimates. We apply our method to the NICU study and find evidence that high-level NICUs significantly reduce deaths for babies of small gestational age, whereas for almost mature babies like 37 weeks, the level of NICUs makes little difference. A sensitivity analysis is conducted to assess the sensitivity of our conclusions to key assumptions about the missing covariates. The method we develop in this paper may be useful for many observational studies facing similar issues of unmeasured confounders and nonignorable missing data as ours.

Article information

Ann. Appl. Stat., Volume 8, Number 1 (2014), 48-73.

First available in Project Euclid: 8 April 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Instrumental variable causal inference sensitivity analysis nonignorable missing data


Yang, Fan; Lorch, Scott A.; Small, Dylan S. Estimation of causal effects using instrumental variables with nonignorable missing covariates: Application to effect of type of delivery NICU on premature infants. Ann. Appl. Stat. 8 (2014), no. 1, 48--73. doi:10.1214/13-AOAS699.

Export citation


  • Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444–455.
  • Angrist, J. D. and Krueger, A. B. (1991). Does compulsory school attendance affect schooling and earnings? Quarterly Journal of Economics 106 979–1014.
  • Baiocchi, M., Small, D. S., Lorch, S. and Rosenbaum, P. R. (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. J. Amer. Statist. Assoc. 105 1285–1296.
  • Boyle, M. H., Torrance, G. W., Sinclair, J. C. and Horwood, S. P. (1983). Economic evaluation of neonatal intensive care of very-low-birth-weight infants. N. Engl. J. Med. 308 1330–13337.
  • Brookhart, M. A. and Schneeweiss, S. (2007). Preference-based instrumental variable methods for the estimation of treatment effects: Assessing validity and interpreting results. Int. J. Biostat. 3 Art. 14, 25.
  • Brookhart, M. A., Wang, P. S., Solomon, D. H. and Schneeweiss, S. (2006). Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable. Epidemiology 17 268–275.
  • Chen, H., Geng, Z. and Zhou, X.-H. (2009). Identifiability and estimation of causal effects in randomized trials with noncompliance and completely nonignorable missing data. Biometrics 65 675–682.
  • Chung, J. H., Phibbs, C. S., Boscardin, W. J., Kominski, G. F., Ortega, A. N. and Needleman, J. (2010). The effect of neonatal intensive care level and hospital volume on mortality of very low birth weight infants. Med. Care 48 635–644.
  • Doyle, L. W. and Victorian Infant Collaborative Study Group (2004). Evaluation of neonatal intensive care for extremely low birth weight infants in Victoria over two decades: II. Efficiency. Pediatrics 113 510–514.
  • Frangakis, C. E. and Rubin, D. B. (1999). Addressing complications of intention-to-treat analysis in the combined presence of all-or-none treatment-noncompliance and subsequent missing outcomes. Biometrika 86 365–379.
  • Guo, Z., Cheng, J., Lorch, S. A. and Small, D. S. (2014). Using an instrumental variable to test for unmeasured confounding. Preprint.
  • Howell, E. M., Richardson, D., Ginsburg, P. and Foot, B. (2002). Deregionalization of neonatal intensive care in urban areas. Am. J. Publ. Health 92 119–124.
  • Imai, K. and Ratkovic, M. (2013). Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7 443–470.
  • Korn, E. L. and Baumrind, S. (1998). Clinician preferences and the estimation of causal treatment differences. Statist. Sci. 13 209–235.
  • Lasswell, S. M., Barfield, W. D., Rochat, R. W. and Blackmon, L. (2010). Perinatal regionalization for very low-birth-weight and very preterm infants: A meta-analysis. J. Am. Med. Assoc. 304 992–1000.
  • Levy, D. E., O’Malley, A. J. and Normand, S. T. (2004). Covariate adjustment in clinical trials with nonignorable missing data and noncompliance. Stat. Med. 23 2319–2339.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, Hoboken, NJ.
  • Lorch, S. A., Baiocchi, M., Ahlberg, C. E. and Small, D. S. (2012). The differential impact of delivery NICU on the outcomes of premature infance. Pediatrics 130 1–9.
  • Mealli, F., Imbens, G., Ferro, S. and Biggeri, A. (2004). Analyzing a randomized trial on breast self examination with noncompliance and missing outcomes. Biostatistics 5 207–222.
  • Olkin, I. and Tate, R. F. (1961). Multivariate correlation models with mixed discrete and continuous variables. Ann. Inst. Statist. Math. 32 448–465.
  • Peng, Y., Little, R. J. A. and Raghunathan, T. E. (2004). An extended general location model for causal inferences from data subject to noncompliance and missing values. Biometrics 60 598–607.
  • Phibbs, C. S., Mark, D. H., Luft, H. S. et al. (1993). Choice of hospital for delivery: A comparison of high-risk and low-risk women. Health Serv. Res. 28 201–222.
  • Phibbs, C. S., Baker, L. C., Caughey, A. B., Danielsen, B., Schmitt, S. K. and Phibbs, R. H. (2007). Level and volume of neonatal intensive care and mortality in very-low-birth-weight infants. N. Engl. J. Med. 356 2165–2175.
  • Profit, J., Lee, D., Zupancic, J. A., Papile, L., Gutierrez, C., Goldie, S. J., Gonzalez-Pier, E. and Salomon, J. A. (2010). Clinical benefits, costs, and cost-effectiveness of neonatal intensive care in Mexico. PLoS Medicine 7 1–10.
  • Richardson, D. K., Reed, K., Cutler, J. C. et al. (1995). Perinatal regionalization vs hospital competition: The Hartford example. Pediatrics 96 417–423.
  • Rogowski, J. A., Horbar, J. D., Staiger, D. O., Kenny, M., Carpenter, J. and Geppert, J. (2004). Indirect vs direct hospital quality indicators for very low-birth-weight infants. J. Am. Med. Assoc. 291 202–209.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. R. Stat. Soc. Ser. B Stat. Methodol. 45 212–218.
  • Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Statist. Assoc. 79 516–524.
  • Roy, J. and Hennessy, S. (2011). Bayesian hierarchical pattern mixture models for comparative effectiveness of drugs and drug classes using healthcare data: A case study involving antihypertensive medications. Statistics in Biosciences 3 79–93.
  • Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Chapman & Hall, London.
  • Small, D. S. and Cheng, J. (2009). Discussions of “Identifiability and estimation of causal effects in randomized trials with noncompliance and completely nonignorable missing data.” Biometrics 65 682–686.
  • Van Buuren, S. and Groothuis-Oudshoorn, K. (2011). mice: Multivariate imputation by chained equations in R. J. Stat. Softw. 45 1–67.
  • Walker, A. (1996). Confounding by indication. Epidemiology 7 335–336.
  • Yang, F., Lorch, S. A. and Small, D. S. (2014). Supplement to “Estimation of causal effects using instrumental variables with nonignorable missing covariates: Application to effect of type of delivery NICU on premature infants.” DOI:10.1214/13-AOAS699SUPP.
  • Yeast, J. D., Poskin, M., Stockbauer, J. W. and Shaffer, S. (1998). Changing patterns in regionalization of perinatal care and the impact on neonatal mortality. Am. J. Obstet. Gynecol. 178 131–135.

Supplemental materials

  • Supplementary material: Supplement to “Estimation of causal effects using instrumental variables with nonignorable missing covariates: Application to effect of type of delivery NICU on premature infants”. We include in the supplementary document the $R$ code for the algorithm to analyze our data, discussion on identifiability in the simplest setup where there is only one covariate which is binary under both our nonignorable missingness assumption and an alternative nonignorable missingness assumption, and detailed results of our sensitivity analysis.