Statistical Science

Demystifying Double Robustness: A Comparison of Alternative Strategies for Estimating a Population Mean from Incomplete Data

Joseph D. Y. Kang and Joseph L. Schafer
Source: Statist. Sci. Volume 22, Number 4 (2007), 523-539.

Abstract

When outcomes are missing for reasons beyond an investigator’s control, there are two different ways to adjust a parameter estimate for covariates that may be related both to the outcome and to missingness. One approach is to model the relationships between the covariates and the outcome and use those relationships to predict the missing values. Another is to model the probabilities of missingness given the covariates and incorporate them into a weighted or stratified estimate. Doubly robust (DR) procedures apply both types of model simultaneously and produce a consistent estimate of the parameter if either of the two models has been correctly specified. In this article, we show that DR estimates can be constructed in many ways. We compare the performance of various DR and non-DR estimates of a population mean in a simulated example where both models are incorrect but neither is grossly misspecified. Methods that use inverse-probabilities as weights, whether they are DR or not, are sensitive to misspecification of the propensity model when some estimated propensities are small. Many DR methods perform better than simple inverse-probability weighting. None of the DR methods we tried, however, improved upon the performance of simple regression-based prediction of the missing values. This study does not represent every missing-data problem that will arise in practice. But it does demonstrate that, in at least some settings, two wrong models are not better than one.

First Page: Show Hide
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.ss/1207580167
Digital Object Identifier: doi:10.1214/07-STS227
Mathematical Reviews number (MathSciNet): MR2420458

References

Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
Mathematical Reviews (MathSciNet): MR1224394
Digital Object Identifier: doi:10.2307/2290350
Zentralblatt MATH: 0774.62031
Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 962–972.
Mathematical Reviews (MathSciNet): MR2216189
Digital Object Identifier: doi:10.1111/j.1541-0420.2005.00377.x
Zentralblatt MATH: 1087.62121
Binder, D. A. (1983). On the variances of asymptotically normal estimators from complex surveys. Internat. Statist. Rev. 51 279–292.
Mathematical Reviews (MathSciNet): MR731144
Digital Object Identifier: doi:10.2307/1402588
Carpenter, J., Kenward, M. and Vansteelandt, S. (2006). A comparison of multiple imputation and inverse probability weighting for analyses with missing data. J. Roy. Statist. Soc. Ser. A 169 571–584.
Mathematical Reviews (MathSciNet): MR2236921
Digital Object Identifier: doi:10.1111/j.1467-985X.2006.00407.x
Cassel, C. M., Särndal, C. E. and Wretman, J. H. (1976). Some results on generalized difference estimation and generalized regression estimation for finite populations. Biometrika 63 615–620.
Mathematical Reviews (MathSciNet): MR445666
Zentralblatt MATH: 0344.62011
Digital Object Identifier: doi:10.1093/biomet/63.3.615
Cassel, C. M., Särndal, C. E. and Wretman, J. H. (1977). Foundations of Inference in Survey Sampling. Wiley, New York.
Mathematical Reviews (MathSciNet): MR652527
Zentralblatt MATH: 0391.62007
Cassel, C. M., Särndal, C. E. and Wretman, J. H. (1983). Some uses of statistical models in connection with the nonresponse problem. In Incomplete Data in Sample Surveys III. Symposium on Incomplete Data, Proceedings (W. G. Madow and I. Olkin, eds.). Academic Press, New York.
Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 24 205–213.
Mathematical Reviews (MathSciNet): MR228136
Digital Object Identifier: doi:10.2307/2528036
D’Agostino, R. B. Jr. (1998). Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Statistics in Medicine 17 2265–2281.
Davidian, M., Tsiatis, A. A. and Leon, S. (2005). Semiparametric estimation of treatment effect in a pretest–posttest study without missing data. Statist. Sci. 20 261–301.
Mathematical Reviews (MathSciNet): MR2189002
Digital Object Identifier: doi:10.1214/088342305000000151
Project Euclid: euclid.ss/1124891293
Zentralblatt MATH: 1100.62554
Gelman, A. and Meng, X. L., eds. (2004). Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives. Wiley, New York.
Mathematical Reviews (MathSciNet): MR2134796
Zentralblatt MATH: 1066.62515
Gelman, A., Carlin, J. B., Stern, H. S. and Rubin, D. B. (2004). Bayesian Data Analysis. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR2027492
Zentralblatt MATH: 1039.62018
Geweke, J. (1989). Bayesian inference in econometric models using Monte Carlo integration. Econometrica 57 1317–1339.
Mathematical Reviews (MathSciNet): MR1035115
Digital Object Identifier: doi:10.2307/1913710
Zentralblatt MATH: 0683.62068
Hammersley, J. M. and Handscomb, D. C. (1964). Monte Carlo Methods. Methuen, London.
Mathematical Reviews (MathSciNet): MR223065
Hastie, T. J. and Tibshirani, R. J. (1990). Generalized Additive Models. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1082147
Zentralblatt MATH: 0747.62061
Hinkley, D. (1985). Transformation diagnostics for linear models. Biometrika 72 487–496.
Mathematical Reviews (MathSciNet): MR817563
Zentralblatt MATH: 0586.62111
Digital Object Identifier: doi:10.1093/biomet/72.3.487
Hirano, K. and Imbens, G. (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catherization. Health Services and Outcome Research Methodology 2 259–278.
Holland, P. W. (1986). Statistics and causal inference. J. Amer. Statist. Assoc. 81 945–970.
Mathematical Reviews (MathSciNet): MR867618
Digital Object Identifier: doi:10.2307/2289064
Zentralblatt MATH: 0607.62001
Horvitz, D. G. and Thompson, D. J. (1952). A generalization of sampling without replacement from a finite universe. J. Amer. Statist. Assoc. 47 663–685.
Mathematical Reviews (MathSciNet): MR53460
Digital Object Identifier: doi:10.2307/2280784
Zentralblatt MATH: 0047.38301
Little, R. J. A. and An, H. (2004). Robust likelihood-based analysis of multivariate data with missing values. Statist. Sinica 14 949–968.
Mathematical Reviews (MathSciNet): MR2089342
Zentralblatt MATH: 1073.62050
Little, R. J. A. (1986). Survey nonresponse adjustments for estimates of means. Internat. Statist. Rev. 54 139–157.
Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data. Wiley, New York.
Mathematical Reviews (MathSciNet): MR890519
Zentralblatt MATH: 0665.62004
Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, New York.
Mathematical Reviews (MathSciNet): MR1925014
Zentralblatt MATH: 1011.62004
Liu, C. (2004). Robit regression: A simple robust alternative to logistic and probit regression. In Applied Bayesian Modeling and Causal Inference from Incomplete-Data Perspectives (A. Gelman and X. L. Meng, eds.) 227–238. Wiley, New York.
Mathematical Reviews (MathSciNet): MR2138259
Digital Object Identifier: doi:10.1002/0470090456.ch21
Zentralblatt MATH: 05274820
Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Statistics in Medicine 23 2937–2960.
McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR727836
Zentralblatt MATH: 0588.62104
Neyman, J. (1923). On the application of probability theory to agricultural experiments: Essays on principles, Section 9. Translated from the Polish and edited by D. M. Dabrowska and T. P. Speed. Statist. Sci. 5 (1990) 465–480.
Mathematical Reviews (MathSciNet): MR1092986
Project Euclid: euclid.ss/1177012031
Oh, H. L. and Scheuren, F. S. (1983). Weighting adjustments for unit nonresponse. In Incomplete Data in Sample Surveys II. Theory and Annotated Bibliography (W. G. Madow, I. Olkin and D. B. Rubin, eds.) 143–184. Academic Press, New York.
Pregibon, D. (1982). Resistant fits for some commonly used logistic models with medical applications. Biometrics 38 485–498.
Robins, J. M. and Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc. 90 122–129.
Mathematical Reviews (MathSciNet): MR1325119
Digital Object Identifier: doi:10.2307/2291135
Zentralblatt MATH: 0818.62043
Robins, J. M. and Rotnitzky, A. (2001). Comment on “Inference for semiparametric models: some questions and an answer,” by P. J. Bickel and J. Kwon. Statist. Sinica 11 920–936.
Mathematical Reviews (MathSciNet): MR1867326
Zentralblatt MATH: 0997.62028
Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846–866.
Mathematical Reviews (MathSciNet): MR1294730
Digital Object Identifier: doi:10.2307/2290910
Zentralblatt MATH: 0815.62043
Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1995). Analysis of semiparametric regression models for repeated outcomes in the presence of missing data. J. Amer. Statist. Assoc. 90 106–121.
Mathematical Reviews (MathSciNet): MR1325118
Digital Object Identifier: doi:10.2307/2291134
Zentralblatt MATH: 0818.62042
Rotnitzky, A., Robins, J. M. and Scharfstein, D. O. (1998). Semiparametric regression for repeated outcomes with ignorable nonresponse. J. Amer. Statist. Assoc. 93 1321–1339.
Mathematical Reviews (MathSciNet): MR1666631
Digital Object Identifier: doi:10.2307/2670049
Zentralblatt MATH: 1064.62520
Rosenbaum, P. R. (2002). Observational Studies, 2nd ed. Springer, New York.
Mathematical Reviews (MathSciNet): MR1899138
Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
Mathematical Reviews (MathSciNet): MR742974
Zentralblatt MATH: 0522.62091
Digital Object Identifier: doi:10.1093/biomet/70.1.41
Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. American Statistician 39 33–38.
Rubin, D. B. (1974a). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educational Psychology 66 688–701.
Rubin, D. B. (1974b). Characterizing the estimation of parameters in incomplete data problems. J. Amer. Statist. Assoc. 69 467–474.
Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592.
Mathematical Reviews (MathSciNet): MR455196
Zentralblatt MATH: 0344.62034
Digital Object Identifier: doi:10.1093/biomet/63.3.581
Rubin, D. B. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34–58.
Mathematical Reviews (MathSciNet): MR472152
Digital Object Identifier: doi:10.1214/aos/1176344064
Project Euclid: euclid.aos/1176344064
Zentralblatt MATH: 0383.62021
Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
Mathematical Reviews (MathSciNet): MR899519
Rubin, D. B. (2005). Causal inference using potential outcomes: design, modeling, decisions. J. Amer. Statist. Assoc. 100 322–331.
Mathematical Reviews (MathSciNet): MR2166071
Digital Object Identifier: doi:10.1198/016214504000001880
Zentralblatt MATH: 1117.62418
Särndal, C.-E., Swensson, B. and Wretman, J. (1989). The weighted residual technique for estimating the variance of the general regression estimator of a finite population total. Biometrika 76 527–537.
Mathematical Reviews (MathSciNet): MR1040646
Zentralblatt MATH: 0677.62004
Särndal, C.-E., Swensson, B. and Wretman, J. (1992). Model Assisted Survey Sampling. Springer, New York.
Mathematical Reviews (MathSciNet): MR1140409
Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Chapman and Hall, London.
Mathematical Reviews (MathSciNet): MR1692799
Zentralblatt MATH: 0997.62510
Schafer, J. L. and Kang, J. D. Y. (2005). Discussion of “Semiparametric estimation of treatment effect in a pretest–postest study with missing data” by M. Davidian et al. Statist. Sci. 20 292–295.
Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Amer. Statist. Assoc. 94 1096–1120 (with rejoinder 1135–1146).
Mathematical Reviews (MathSciNet): MR1731478
Digital Object Identifier: doi:10.2307/2669923
Zentralblatt MATH: 1072.62644
van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer, New York.
Mathematical Reviews (MathSciNet): MR1958123
Zentralblatt MATH: 1013.62034
Vartivarian, S. and Little, R. J. A. (2002). On the formation of weighting adjustment cells for unit nonresponse. Proceedings of the Survey Research Methods Section, American Statistical Association. Amer. Statist. Assoc., Alexandria, VA.
Winship, C. and Sobel, M. E. (2004). Causal inference in sociological studies. In Handbook of Data Analysis (M. Hardy, ed.) 481–503. Thousand Oaks, Sage, CA.

2012 © Institute of Mathematical Statistics

Statistical Science

Statistical Science