Open Access
November 2016 Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations
Innocent Karangwa, Danelle Kotze, Renette Blignaut
Braz. J. Probab. Stat. 30(4): 521-539 (November 2016). DOI: 10.1214/15-BJPS292

Abstract

Missing data are common in survey data sets. Enrolled subjects do not often have data recorded for all variables of interest. The inappropriate handling of them may negatively affect the inferences drawn. Therefore, special attention is needed when analysing incomplete data. The multivariate normal imputation (MVNI) and the multiple imputation by chained equations (MICE) have emerged as the best techniques to deal with missing data. The former assumes a normal distribution of the variables in the imputation model and the latter fills in missing values taking into account the distributional form of the variables to be imputed. This study examines the performance of these methods when data are missing at random on unordered categorical variables treated as predictors in the regression models. First, a survey data set with no missing values is used to generate a data set with missing at random observations on unordered categorical variables. Then, the two methods are separately used to impute the missing values of the generated data set. Their performance is compared in terms of bias and standard errors of the estimates from the regression models that determine the association between the woman’s contraceptive methods use status and her marital status, controlling for the region of origin. The baseline data used is the 2007 Demographic and Health Survey (DHS) data set from the Democratic Republic of Congo. The findings indicate that although the MVNI relies on the statistical parametric theory, it produces more accurate estimates than MICE for nonordered categorical variables.

Citation

Download Citation

Innocent Karangwa. Danelle Kotze. Renette Blignaut. "Multiple imputation of unordered categorical missing data: A comparison of the multivariate normal imputation and multiple imputation by chained equations." Braz. J. Probab. Stat. 30 (4) 521 - 539, November 2016. https://doi.org/10.1214/15-BJPS292

Information

Received: 1 September 2014; Accepted: 1 April 2015; Published: November 2016
First available in Project Euclid: 13 December 2016

zbMATH: 1359.62480
MathSciNet: MR3582388
Digital Object Identifier: 10.1214/15-BJPS292

Keywords: categorical data , missing at random , missing data , multiple imputation , multiple imputation by chained equations , multivariate normal imputation

Rights: Copyright © 2016 Brazilian Statistical Association

Vol.30 • No. 4 • November 2016
Back to Top