Statistical Science

Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas

Michael J. Lopez and Roee Gutman

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

The propensity score is a common tool for estimating the causal effect of a binary treatment in observational data. In this setting, matching, subclassification, imputation or inverse probability weighting on the propensity score can reduce the initial covariate bias between the treatment and control groups. With more than two treatment options, however, estimation of causal effects requires additional assumptions and techniques, the implementations of which have varied across disciplines. This paper reviews current methods, and it identifies and contrasts the treatment effects that each one estimates. Additionally, we propose possible matching techniques for use with multiple, nominal categorical treatments, and use simulations to show how such algorithms can yield improved covariate similarity between those in the matched sets, relative the pre-matched cohort. To sum, this manuscript provides a synopsis of how to notate and use causal methods for categorical treatments.

Article information

Source
Statist. Sci., Volume 32, Number 3 (2017), 432-454.

Dates
First available in Project Euclid: 1 September 2017

Permanent link to this document
https://projecteuclid.org/euclid.ss/1504253125

Digital Object Identifier
doi:10.1214/17-STS612

Mathematical Reviews number (MathSciNet)
MR3696004

Zentralblatt MATH identifier
06870254

Keywords
Causal inference propensity score multiple treatments matching observational data

Citation

Lopez, Michael J.; Gutman, Roee. Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas. Statist. Sci. 32 (2017), no. 3, 432--454. doi:10.1214/17-STS612. https://projecteuclid.org/euclid.ss/1504253125


Export citation

References

  • Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica 74 235–267.
  • Abadie, A. and Imbens, G. W. (2008). On the failure of the bootstrap for matching estimators. Econometrica 76 1537–1557.
  • Armstrong, C. S., Jagolinzer, A. D. and Larcker, D. F. (2010). Chief executive officer equity incentives and accounting irregularities. J. Acc. Res. 48 225–271.
  • Austin, P. C. (2009). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Stat. Med. 28 3083–3107.
  • Austin, P. C. (2011). Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharm. Stat. 10 150–161.
  • Austin, P. C., Grootendorst, P. and Anderson, G. M. (2007). A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: A Monte Carlo study. Stat. Med. 26 734–753.
  • Austin, P. C. and Small, D. S. (2014). The use of bootstrapping when using propensity-score matching without replacement: A simulation study. Stat. Med. 33 4306–4319.
  • Bezdek, J. C., Ehrlich, R. and Full, W. (1984). FCM: The fuzzy c-means clustering algorithm. Comput. Geosci. 10 191–203.
  • Bryson, A., Dorsett, R. and Purdon, S. (2002). The use of propensity score matching in the evaluation of active labour market policies.
  • Caliendo, M. and Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. J. Econ. Surv. 22 31–72.
  • Cangul, M. Z., Chretien, Y. R., Gutman, R. and Rubin, D. B. (2009). Testing treatment effects in unconfounded studies under model misspecification: Logistic regression, discretization, and their combination. Stat. Med. 28 2531–2551.
  • Chertow, G. M., Normand, S. L. T. and McNeil, B. J. (2004). “Renalism”: Inappropriately low rates of coronary angiography in elderly individuals with renal insufficiency. J. Am. Soc. Nephrol. 15 2462–2468.
  • Crump, R. K., Hotz, V. J., Imbens, G. W. and Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika 96 187–199.
  • D’Agostino, R. B. (1998). Tutorial in biostatistics: Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group. Stat. Med. 17 2265–2281.
  • Davidson, M. B., Hix, J. K., Vidt, D. G. and Brotman, D. J. (2006). Association of impaired diurnal blood pressure variation with a subsequent decline in glomerular filtration rate. Arch. Intern. Med. 166 846–852.
  • Dearing, E., McCartney, K. and Taylor, B. A. (2009). Does higher quality early child care promote low-income children’s math and reading achievement in middle childhood? Child Dev. 80 1329–1349.
  • Dehejia, R. H. and Wahba, S. (1998). Causal effects in non-experimental studies: Re-evaluating the evaluation of training programs. Technical report, National Bureau of Economic Research.
  • Dehejia, R. H. and Wahba, S. (2002). Propensity score-matching methods for nonexperimental causal studies. Rev. Econ. Stat. 84 151–161.
  • Dore, D. D., Swaminathan, S., Gutman, R., Trivedi, A. N. and Mor, V. (2013). Different analyses estimate different parameters of the effect of erythropoietin stimulating agents on survival in end stage renal disease: A comparison of payment policy analysis, instrumental variables, and multiple imputation of potential outcomes. J. Clin. Epidemiol. 66 S42–S50.
  • Dorsett, R. (2006). The new deal for young people: Effect on the labour market status of young men. Labour Econ. 13 405–422.
  • Drichoutis, A. C., Lazaridis, P. and Nayga Jr., R. M. (2005). Nutrition knowledge and consumer use of nutritional food labels. Eur. Rev. Agricult. Econ. 32 93–118.
  • Efron, B. and Tibshirani, R. J. (1994). An Introduction to the Bootstrap. CRC Press, Boca Raton.
  • Feng, P., Zhou, X.-H., Zou, Q.-M., Fan, M.-Y. and Li, X.-S. (2012). Generalized propensity score for estimating the average treatment effect of multiple treatments. Stat. Med. 31 681–697.
  • Filardo, G., Hamilton, C., Hamman, B. and Grayburn, P. (2007). Obesity and stroke after cardiac surgery: The impact of grouping body mass index. Ann. Thorac. Surg. 84 720–722.
  • Filardo, G., Hamilton, C., Hamman, B., Hebeler Jr., R. F. and Grayburn, P. A. (2009). Relation of obesity to atrial fibrillation after isolated coronary artery bypass grafting. Am. J. Cardiol. 103 663–666.
  • Frank, R., Akresh, I. R. and Lu, B. (2010). Latino immigrants and the US racial order. Am. Sociol. Rev. 75 378–401.
  • Gutman, R. and Rubin, D. B. (2013). Robust estimation of causal effects of binary treatments in unconfounded studies with dichotomous outcomes. Stat. Med. 32 1795–1814.
  • Gutman, R. and Rubin, D. B. (2015). Estimation of causal effects of binary treatments in unconfounded studies. Stat. Med. 34 3381–3398.
  • Hade, E. M. (2012). Propensity score adjustment in multiple group observational studies: Comparing matching and alternative methods. Ph.D. thesis, Ohio State University.
  • Hade, E. M. and Lu, B. (2014). Bias associated with using the estimated propensity score as a regression covariate. Stat. Med. 33 74–87.
  • Hedman, L. and Van Ham, M. (2012). Understanding Neighbourhood Effects: Selection Bias and Residential Mobility. Springer, Berlin.
  • Hill, J. and Reiter, J. P. (2006). Interval estimation for treatment effects using propensity score matching. Stat. Med. 25 2230–2256.
  • Holland, P. W. (1986). Statistics and causal inference. J. Amer. Statist. Assoc. 81 945–970.
  • Hott, J. R., Brunelle, N. and Myers, J. A. (2012). KD-tree algorithm for propensity score matching with three or more treatment groups. Division of Pharmacoepidemiology and Pharmacoeconomics, Technical Report Series.
  • Iacus, S. M., King, G. and Porro, G. (2011). Causal inference without balance checking: Coarsened exact matching. Polit. Anal. mpr013.
  • Imai, K. and Ratkovic, M. (2014). Covariate balancing propensity score. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 243–263.
  • Imai, K. and van Dyk, D. A. (2004). Causal inference with general treatment regimes: Generalizing the propensity score. J. Amer. Statist. Assoc. 99 854–866.
  • Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions. Biometrika 87 706–710.
  • Imbens, G. W. and Rubin, D. B. (2015). Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge Univ. Press, Cambridge.
  • Joffe, M. M. and Rosenbaum, P. R. (1999). Invited commentary: Propensity scores. Am. J. Epidemiol. 150 327–333.
  • Johnson, R. A., Wichern, D. W. et al. (1992). Applied Multivariate Statistical Analysis 4. Prentice Hall, Englewood Cliffs, NJ.
  • Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22 523–539.
  • Karp, R. M. (1972). Reducibility among combinatorial problems. In Complexity of Computer Computations (Proc. Sympos., IBM Thomas J. Watson Res. Center, Yorktown Heights, N.Y., 1972) 85–103. Plenum, New York.
  • Kilpatrick, R. D., Gilbertson, D., Brookhart, M. A., Polley, E., Rothman, K. J. and Bradbury, B. D. (2013). Exploring large weight deletion and the ability to balance confounders when using inverse probability of treatment weighting in the presence of rare treatment decisions. Pharmacoepidemiol. Drug Saf. 22 111–121.
  • Kosteas, V. D. (2010). The effect of exercise on earnings: Evidence from the NLSY. J. Labor Res. 1–26.
  • Lechner, M. (2001). Identification and estimation of causal effects of multiple treatments under the conditional independence assumption. Econom. Evaluation Labour Mark. Polic. 43–58.
  • Lechner, M. (2002). Program heterogeneity and propensity score matching: An application to the evaluation of active labor market policies. Rev. Econ. Stat. 84 205–220.
  • Lee, B. K., Lessler, J. and Stuart, E. A. (2011). Weight trimming and propensity score weighting. PLoS ONE 6 e18174.
  • Levin, I. and Alvarez, R. M. (2009). Measuring the effects of voter confidence on political participation: An application to the 2006 Mexican election. VTP Working Paper 75, Caltech/MIT Voting Technology Project.
  • Little, R. J. A. (1988). Missing-data adjustments in large surveys. J. Bus. Econom. Statist. 287–296.
  • Lopez, M. J. and Gutman, R. (2014). Estimating the average treatment effects of nutritional label use using subclassification with regression adjustment. Stat. Methods Med. Res. DOI:10.1177/0962280214560046.
  • Lu, B., Zanutto, E., Hornik, R. and Rosenbaum, P. R. (2001). Matching with doses in an observational study of a media campaign against drug abuse. J. Amer. Statist. Assoc. 96 1245–1253.
  • McCaffrey, D. F., Ridgeway, G. and Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies. Psychol. Methods 9 403–425.
  • McCaffrey, D. F., Griffin, B. A., Almirall, D., Slaughter, M. E., Ramchand, R. and Burgette, L. F. (2013). A tutorial on propensity score estimation for multiple treatments using generalized boosted models. Stat. Med. 32 3388–3414.
  • McCullagh, P. (1980). Regression models for ordinal data. J. R. Stat. Soc. Ser. B. Stat. Methodol. 42 109–142.
  • Moore, A. W. (1991). An introductory tutorial on kd-trees. Extract from PhD thesis. Technical report.
  • Quade, D. (1979). Using weighted rankings in the analysis of complete blocks with additive block effects. J. Amer. Statist. Assoc. 74 680–683.
  • R Core Team (2014). R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rassen, J. A., Solomon, D. H., Glynn, R. J. and Schneeweiss, S. (2011). Simultaneously assessing intended and unintended treatment effects of multiple treatment options: A pragmatic “matrix design.” Pharmacoepidemiol. Drug Saf. 20 675–683.
  • Rassen, J. A., Shelat, A. A., Franklin, J. M., Glynn, R. J., Solomon, D. H. and Schneeweiss, S. (2013). Matching by propensity score in cohort studies with three treatment groups. Epidemiology 24 401–409.
  • Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology. Epidemiology 11 550–560.
  • Rosenbaum, P. R. (1991). A characterization of optimal designs for observational studies. J. R. Stat. Soc., B 53 597–610.
  • Rosenbaum, P. R. (2002). Observational Studies, 2nd ed. Springer, New York.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Statist. Assoc. 79 516–524.
  • Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Amer. Statist. 39 33–38.
  • Royston, P., Altman, D. G. and Sauerbrei, W. (2006). Dichotomizing continuous predictors in multiple regression: A bad idea. Stat. Med. 25 127–141.
  • Rubin, D. B. (1973). Matching to remove bias in observational studies. Biometrics 29 159–183.
  • Rubin, D. B. (1975). Bayesian inference for causality: The importance of randomization. In The Proceedings of the Social Statistics Section of the American Statistical Association 233–239.
  • Rubin, D. B. (1976). Multivariate matching methods that are equal percent bias reducing. II. Maximums on bias reduction for fixed sample sizes. Biometrics 32 121–132.
  • Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Amer. Statist. Assoc. 74 318–328.
  • Rubin, D. B. (1980). Discussion of Basu’s paper. J. Amer. Statist. Assoc. 75 591–593.
  • Rubin, D. B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Serv. Outcomes Res. Methodol. 2 169–188.
  • Rubin, D. B. and Thomas, N. (1992a). Affinely invariant matching methods with ellipsoidal distributions. Ann. Statist. 20 1079–1093.
  • Rubin, D. B. and Thomas, N. (1992b). Characterizing the effect of matching using linear propensity score methods with normal distributions. Biometrika 79 797–809.
  • Rubin, D. B. and Thomas, N. (1996). Matching using estimated propensity scores: Relating theory to practice. Biometrics 52 249–264.
  • Sakia, R. M. (1992). The Box–Cox transformation technique: A review. Statistician 42 169–178.
  • SAS Institute Inc. (2003). SAS/STAT Software. SAS Institute Inc., Cary, NC.
  • Schneeweiss, S., Setoguchi, S., Brookhart, A., Dormuth, C. and Wang, P. S. (2007). Risk of death associated with the use of conventional versus atypical antipsychotic drugs among elderly patients. CMAJ, Can. Med. Assoc. J. 176 627–632.
  • Sekhon, J. (2011). Multivariate and propensity score matching software with automated balance optimization: The matching package for R. J. Stat. Softw. 42, 1–52.
  • Snodgrass, G., Blokland, A. A. J., Haviland, A., Nieuwbeerta, P. and Nagin, D. S. (2011). Does the time cause the crime? An examination of the relationship between time served and reoffending in the Netherlands. Criminology 49 1149–1194.
  • Splawa-Neyman, J., Dabrowska, D. M. and Speed, T. P. (1990 [1923]). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 465–472.
  • Spreeuwenberg, M. D., Bartak, A., Croon, M. A., Hagenaars, J. A., Busschbach, J. J. V., Andrea, H., Twisk, J. and Stijnen, T. (2010). The multiple propensity score as control for bias in the comparison of more than two treatment arms: An introduction from a case study in mental health. Med. Care 48 166.
  • Sprent, P. and Smeeton, N. C. (2007). Applied Nonparametric Statistical Methods. CRC Press, Boca Raton, FL.
  • Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statist. Sci. 25 1–21.
  • Stuart, E. A. and Rubin, D. B. (2008). Best practices in quasi-experimental designs. Best Pract. Quant. Methods 155–176.
  • Tan, Z. (2010). Bounded, efficient and doubly robust estimation with inverse weighting. Biometrika 97 661–682.
  • Tchernis, R., Horvitz-Lennon, M. and Normand, S. L. T. (2005). On the use of discrete choice models for causal inference. Stat. Med. 24 2197–2212.
  • Tu, C., Jiao, S. and Koh, W. Y. (2012). Comparison of clustering algorithms on generalized propensity score in observational studies: A simulation study. J. Stat. Comput. Simul. 83 2206–2218.
  • Vermorken, J. B., Parmar, M. K., Brady, M. F., Eisenhauer, E. A., Hogberg, T., Ozols, R. F., Rochon, J., Rustin, G. J., Sagae, S., Verheijen, R. H. et al. (2005). Clinical trials in ovarian carcinoma: Study methodology. Ann. Oncol. 16 viii20.
  • Yanovitzky, I., Zanutto, E. and Hornik, R. (2005). Estimating causal effects of public health education campaigns using propensity score methodology. Eval. Program Plann. 28 209–220.
  • Zanutto, E., Lu, B. and Hornik, R. (2005). Using propensity score subclassification for multiple treatment doses to evaluate a national antidrug media campaign. J. Educ. Behav. Stat. 30 59–73.
  • Zubizarreta, J. R. (2012). Using mixed integer programming for matching in an observational study of kidney failure after surgery. J. Amer. Statist. Assoc. 107 1360–1371.