Statistical Science

Multiple Imputation for Multilevel Data with Continuous and Binary Variables

Vincent Audigier, Ian R. White, Shahab Jolani, Thomas P. A. Debray, Matteo Quartagno, James Carpenter, Stef van Buuren, and Matthieu Resche-Rigon

Full-text: Open access


We present and compare multiple imputation methods for multilevel continuous and binary data where variables are systematically and sporadically missing. The methods are compared from a theoretical point of view and through an extensive simulation study motivated by a real dataset comprising multiple studies. The comparisons show that these multiple imputation methods are the most appropriate to handle missing values in a multilevel setting and why their relative performances can vary according to the missing data pattern, the multilevel structure and the type of missing variables. This study shows that valid inferences can only be obtained if the dataset includes a large number of clusters. In addition, it highlights that heteroscedastic multiple imputation methods provide more accurate inferences than homoscedastic methods, which should be reserved for data with few individuals per cluster. Finally, guidelines are given to choose the most suitable multiple imputation method according to the structure of the data.

Article information

Statist. Sci., Volume 33, Number 2 (2018), 160-183.

First available in Project Euclid: 3 May 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Missing data systematically missing values multilevel data mixed data multiple imputation joint modelling fully conditional specification


Audigier, Vincent; White, Ian R.; Jolani, Shahab; Debray, Thomas P. A.; Quartagno, Matteo; Carpenter, James; van Buuren, Stef; Resche-Rigon, Matthieu. Multiple Imputation for Multilevel Data with Continuous and Binary Variables. Statist. Sci. 33 (2018), no. 2, 160--183. doi:10.1214/18-STS646.

Supplemental materials

  • Supplement to “Multiple Imputation for Multilevel Data with Continuous and Binary Variables”. Technical details on the posterior distributions of imputation model parameters and inference results for all configurations that have not been discussed in detail in the main text.
  • Supplement to “Multiple Imputation for Multilevel Data with Continuous and Binary Variables”. R code for the simulation study.