The Annals of Applied Statistics

Addressing missing data mechanism uncertainty using multiple-model multiple imputation: Application to a longitudinal clinical trial

Juned Siddique, Ofer Harel, and Catherine M. Crespi

Full-text: Open access


We present a framework for generating multiple imputations for continuous data when the missing data mechanism is unknown. Imputations are generated from more than one imputation model in order to incorporate uncertainty regarding the missing data mechanism. Parameter estimates based on the different imputation models are combined using rules for nested multiple imputation. Through the use of simulation, we investigate the impact of missing data mechanism uncertainty on post-imputation inferences and show that incorporating this uncertainty can increase the coverage of parameter estimates. We apply our method to a longitudinal clinical trial of low-income women with depression where nonignorably missing data were a concern. We show that different assumptions regarding the missing data mechanism can have a substantial impact on inferences. Our method provides a simple approach for formalizing subjective notions regarding nonresponse so that they can be easily stated, communicated and compared.

Article information

Ann. Appl. Stat., Volume 6, Number 4 (2012), 1814-1837.

First available in Project Euclid: 27 December 2012

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Nonignorable NMAR MNAR not missing at random missing not at random


Siddique, Juned; Harel, Ofer; Crespi, Catherine M. Addressing missing data mechanism uncertainty using multiple-model multiple imputation: Application to a longitudinal clinical trial. Ann. Appl. Stat. 6 (2012), no. 4, 1814--1837. doi:10.1214/12-AOAS555.

Export citation


  • Barnes, S. A., Larsen, M. D., Schroeder, D., Hanson, A. and Decker, P. A. (2010). Missing data assumptions and methods in a smoking cessation study. Addiction 105 431–437.
  • Belin, T. R., Diffendal, G. J., Mack, S., Rubin, D. B., Schafer, J. L. and Zaslavsky, A. M. (1993). Hierarchical logistic regression models for imputation of unresolved enumeration status in undercount estimation (with discussion). J. Amer. Statist. Assoc. 88 1149–1166.
  • Blackburn, I. M., Bishop, S., Glen, A. I., Whalley, L. J. and Christie, J. E. (1981). The efficacy of cognitive therapy in depression: A treatment trial using cognitive therapy and pharmacotherapy, each alone and in combination. Br. J. Psychiatry 139 181–189.
  • Carpenter, J. R., Kenward, M. G. and White, I. R. (2007). Sensitivity analysis after multiple imputation under missing at random: A weighting approach. Stat. Methods Med. Res. 16 259–275.
  • Cochran, W. G. (1977). Sampling Techniques, 3rd ed. Wiley, New York–London–Sydney.
  • Collins, L. M., Schafer, J. L. and Kam, C. (2001). A comparison of inclusive and restrictive strategies in modern missing data procedures. Psychological Methods 6 330–351.
  • Daniels, M. J. and Hogan, J. W. (2008). Missing Data in Longitudinal Studies. Monographs on Statistics and Applied Probability 109. Chapman & Hall/CRC, Boca Raton, FL.
  • Demirtas, H. and Schafer, J. L. (2003). On the performance of random-coefficient pattern-mixture models for non-ignorable drop-out. Stat. Med. 22 2553–2575.
  • Elkin, I., Shea, M. T., Watkins, J. T., Imber, S. D., Sotsky, S. M., Collins, J. F., Glass, D. R., Pilkonis, P. A., Leber, W. R., Docherty, J. P., Fiester, S. J. and Parloff, M. B. (1989). National Institute of Mental Health treatment of depression collaborative research program: General effectiveness of treatments. Arch. Gen. Psychiatry 46 971–982.
  • Forster, J. J. and Smith, P. W. F. (1998). Model-based inference for categorical survey data subject to non-ignorable non-response. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 57–70.
  • Harel, O. (2007). Inferences on missing information under multiple imputation and two-stage multiple imputation. Stat. Methodol. 4 75–89.
  • Harel, O. (2008). Outfluence—the impact of missing values. Model Assist. Stat. Appl. 3 161–168.
  • Harel, O. (2009). Strategies for Data Analysis with Two Types of Missing Values: From Theory to Application. Lambert Academic Publishing, Saarbrücken, Germany.
  • Harel, O. and Stratton, J. (2009). Inferences on the outfluence—how do missing values impact your analysis? Comm. Statist. Theory Methods 38 2884–2898.
  • Hedeker, D. and Gibbons, R. D. (2006). Longitudinal Data Analysis. Wiley-Interscience, Hoboken, NJ.
  • Ibrahim, J. G. and Molenberghs, G. (2009). Missing data methods in longitudinal studies: A review. TEST 18 1–43.
  • Kaciroti, N. A., Raghunathan, T. E., Schork, M. A., Clark, N. M. and Gong, M. (2006). A Bayesian approach for clustered longitudinal ordinal outcome with nonignorable missing data: Evaluation of an asthma education program. J. Amer. Statist. Assoc. 101 435–446.
  • Kadane, J. and Wolfson, L. J. (1998). Experiences in elicitation. J. Roy. Statist. Soc. Ser. D 47 3–19.
  • Landrum, M. B. and Becker, M. P. (2001). A multiple imputation strategy for incomplete longitudinal data. Stat. Med. 20 2741–2760.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley-Interscience, Hoboken, NJ.
  • Liu, C. (1995). Missing data imputation using the multivariate $t$ distribution. J. Multivariate Anal. 53 139–158.
  • Miranda, J., Chung, J. Y., Green, B. L., Krupnick, J., Siddique, J., Revicki, D. A. and Belin, T. (2003). Treating depression in predominantly low-income young minority women: A randomized controlled trial. JAMA 290 57–65.
  • Molenberghs, G., Kenward, M. G. and Goetghebeur, E. (2001). Sensitivity analysis for incomplete contingency tables: The Slovenian plebiscite case. J. R. Stat. Soc. Ser. C. Appl. Stat. 50 15–29.
  • Molenberghs, G., Thijs, H., Kenward, M. G. and Verbeke, G. (2003). Sensitivity analysis of continuous incomplete longitudinal outcomes. Stat. Neerl. 57 112–135.
  • Paddock, S. M. and Ebener, P. (2009). Subjective prior distributions for modeling longitudinal continuous outcomes with non-ignorable dropout. Stat. Med. 28 659–678.
  • Raghunathan, T. E., Lepkowski, J. M., Hoewyk, J. V. and Solenberger, P. (2001). A multivariate technique for multiply imputing missing values using a sequence of regression models. Survey Methodology 27 85–95.
  • Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592.
  • Rubin, D. B. (1977). Formalizing subjective notions about the effect of nonrespondents in sample surveys. J. Amer. Statist. Assoc. 72 538–543.
  • Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
  • Rubin, D. B. (2003). Nested multiple imputation of NMES via partially incompatible MCMC. Stat. Neerl. 57 3–18.
  • Rubin, D. B. and Schenker, N. (1991). Multiple imputation in health-care databases: An overview and some applications. Stat. Med. 10 585–598.
  • Rubin, D. B., Stern, H. S. and Vehovar, V. (1995). Handling “don’t know” survey responses: The case of the Slovenian plebiscite. J. Amer. Statist. Assoc. 90 822–828.
  • Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Monographs on Statistics and Applied Probability 72. Chapman & Hall, London.
  • Schafer, J. L. and Graham, J. W. (2002). Missing data: Our view of the state of the art. Psychol. Methods 7 147–177.
  • Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Amer. Statist. Assoc. 94 1096–1146.
  • Schenker, N. and Taylor, J. M. G. (1996). Partially parametric techniques for multiple imputation. Comput. Statist. Data Anal. 22 425–446.
  • Shen, Z. J. (2000). Nested multiple imputation. Ph.D. thesis, Dept. Statistics, Harvard Univ., Cambridge, MA.
  • Siddique, J. and Belin, T. R. (2008a). Multiple imputation using an iterative hot-deck with distance-based donor selection. Stat. Med. 27 83–102.
  • Siddique, J. and Belin, T. R. (2008b). Using an approximate Bayesian bootstrap to multiply impute nonignorable missing data. Comput. Statist. Data Anal. 53 405–415.
  • Siddique, J., Harel, O. and Crespi, C. M. (2012). Supplement to “Addressing missing data mechanism uncertainty using multiple-model multiple imputation: Application to a longitudinal clinical trial.” DOI:10.1214/12-AOAS555SUPP.
  • Thijs, H., Molenberghs, G., Michiels, B., Verbeke, G. and Curran, D. (2002). Strategies to fit pattern-mixture models. Biostatistics 3 245–265.
  • Tversky, A. and Kahneman, D. (1974). Judgment under uncertainty: Heuristics and biases. Science 185 1124–1131.
  • van Buuren, S. (2007). Multiple imputation of discrete and continuous data by fully conditional specification. Stat. Methods Med. Res. 16 219–242.
  • van Buuren, S., Boshuizen, H. C. and Knook, D. L. (1999). Multiple imputation of missing blood pressure covariates in survival analysis. Stat. Med. 18 681–694.
  • van Buuren, S. and Oudshoorn, C. (2011). MICE: Multivariate Imputation by Chained Equations. R package version 2.5.
  • Vansteelandt, S., Goetghebeur, E., Kenward, M. G. and Molenberghs, G. (2006). Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statist. Sinica 16 953–979.
  • Wachter, K. W. (1993). Comment on hierarchical logistic regression models for imputation of unresolved enumeration status in undercount estimation. J. Amer. Statist. Assoc. 88 1161–1163.
  • Warden, D., Rush, A. J., Wisniewski, S. R., Lesser, I. M., Kornstein, S. G., Balasubramani, G. K., Thase, M. E., Preskorn, S. H., Nierenberg, A. A., Young, E. A., Shores-Wilson, K. and Trivedi, M. H. (2009). What predicts attrition in second step medication treatments for depression?: A STAR∗D report. The International Journal of Neuropsychopharmacology 12 459–473.
  • White, I. R., Carpenter, J., Evans, S. and Schroter, S. (2007). Eliciting and using expert opinions about dropout bias in randomized controlled trials. Clinical Trials 4 125–139.

Supplemental materials

  • Supplementary material: CombineNestedImputations: An R function for combining inferences based on nested multiple imputations. This R function combines inferences based on nested multiply imputed data sets and calculates rates of missing information.