International Statistical Review

Multiple Imputation: Theory and Method

Paul Zhang
Source: Internat. Statist. Rev. Volume 71, Number 3 (2003), 581-592.

Abstract

In this review paper, we discuss the theoretical background of multiple imputation, describe how to build an imputation model and how to create proper imputations. We also present the rules for making repeated imputation inferences. Three widely used multiple imputation methods, the propensity score method, the predictive model method and the Markov chain Monte Carlo (MCMC) method, are presented and discussed.

First Page: Show Hide
Full-text: Access denied (no subscription detected)
We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.isr/1066768709
Zentralblatt MATH identifier: 02124743

References

[1] Anderson, T.W. (1957). Maximum likelihood estimates for the multivariate normal distribution when some observations are missing. Journal of the American Statistical Association, 52, 200-203.
[2] Barnard, J. & Meng, X.L. (1999). Applications of multiple imputation in medical studies: from AIDS to NHANES. Statistical Methods in Medical Research, 8, 17-36.
[3] Barnard, J. & Rubin, D.B. (1999). Small-sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.
[4] Dempster, A.P., Laird, N.M. & Rubin, D.B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society, Series B, 39, 1-38.
[5] Diggle, P.J., Liang, K.Y. & Zeger, S.L. (1994). Analysis of Longitudinal Data. Oxford: Clarendon Press.
[6] Fay, R.E. (1992). When are inferences from multiple imputation valid? Proceedings of the Survey Research Methods Section of the American Statistical Association, pp.\;227-232.
[7] Gelman, A. & Rubin, D.B. (1992). Inference from iterative simulation using multiple sequences (with discussion). Statistical Science, 7, 457-472.
[8] Gelman, A., Meng, X.L. & Stern, H. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica, 6, 733-807.
[9] Gilks, W.R., Richardson, S. & Spiegelhalter, D.J. (1996). Markov chain Monte Carlo in practice. London: Chapman & Hall.
[10] Horton, N.J. & Lipsitz, S.R. (2001). Multiple imputation in practice: Comparison of software package for regression models with missing variables. The American Statistician, 55, 244-254.
[11] Lavori, P.W., Dawson, R. & Shera, D. (1995). A multiple imputation strategy for clinical trials with truncation of patient data. Statistics in Medicine, 14, 1913-1925.
[12] Li, K.H. (1988). Imputation using Markov chains. Journal of Statistical Computation and Simulation, 30, 57-79.
[13] Little, R.J.A. & Rubin, D.B. (1987). Statistical analysis with missing data. New York: John Wiley.
[14] McCullagh, P. & Nelder, J.A. (1989). Generalized linear models. New York: Chapman and Hall.
[15] Meng, X.L. (1995). Multiple-imputation inferences with uncongenial sources of input (with discussion). Statistical Science, 10, 538-573.
[16] Rosenbaum, P.R. & Rubin, D.B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55.
[17] Rubin, D.B. (1974). Characterizing the estimation of parameters in incomplete data problems. Journal of the American Statistical Association, 69, 467-474.
[18] Rubin, D.B. (1976). Inference and missing data. Biometrika, 63, 581-592.
[19] Rubin, D.B. (1977). Formalizing subjective notions about the effect of nonrespondents in sample surveys. Journal of the American Statistical Association, 72, 538-543.
[20] Rubin, D.B. (1978). Multiple imputations in sample surveys. Proceedings of the Survey Research Methods Section of the American Statistical Association, pp.\;20-34.
[21] Rubin, D.B. (1984). Bayesianly justifiable and relevant frequency calculations for applied statisticians. The Annals of Statistics, 12, 1151-1172.
[22] Rubin, D.B. (1987). Multiple imputation for non-response in surveys. New York: John Wiley.
[23] Rubin, D.B. (1996). Multiple imputation after 18+ years. Journal of American Statistical Association, 91, 473-489.
[24] Rubin, D.B. & Schenker, N. (1986). Multiple imputation for interval estimation from simple random samples with ignorable nonresponse. Journal of the American Statistical Association, 81, 366-374.
[25] SAS/STAT Software: Changes and Enhancements, Release 8.2 (2001). Carry, North Carolina: SAS Institute.
[26] Schafer, J.L. (1997). Analysis of incomplete multivariate data. London: Chapman & Hall.
[27] Schafer, J.L. (1999). Multiple imputation: A primer. Statistical Methods in Medical Research, 8, 3-15.
[28] Solas 3.0. User Reference (1999). Statistical Solutions Ltd., Ireland.
[29] Tanner, M.A. & Wong, W.H. (1987). The calculation of posterior distributions by data augmentation (with discussion). Journal of American Statistical Association, 82, 528-550.

2013 © International Statistical Institute/Bernoulli Society

International Statistical Review

International Statistical Review