## Statistical Science

- Statist. Sci.
- Volume 32, Number 3 (2017), 432-454.

### Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas

Michael J. Lopez and Roee Gutman

**Full-text: Access denied (no subscription detected) **

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

#### Abstract

The propensity score is a common tool for estimating the causal effect of a binary treatment in observational data. In this setting, matching, subclassification, imputation or inverse probability weighting on the propensity score can reduce the initial covariate bias between the treatment and control groups. With more than two treatment options, however, estimation of causal effects requires additional assumptions and techniques, the implementations of which have varied across disciplines. This paper reviews current methods, and it identifies and contrasts the treatment effects that each one estimates. Additionally, we propose possible matching techniques for use with multiple, nominal categorical treatments, and use simulations to show how such algorithms can yield improved covariate similarity between those in the matched sets, relative the pre-matched cohort. To sum, this manuscript provides a synopsis of how to notate and use causal methods for categorical treatments.

#### Article information

**Source**

Statist. Sci., Volume 32, Number 3 (2017), 432-454.

**Dates**

First available in Project Euclid: 1 September 2017

**Permanent link to this document**

https://projecteuclid.org/euclid.ss/1504253125

**Digital Object Identifier**

doi:10.1214/17-STS612

**Mathematical Reviews number (MathSciNet)**

MR3696004

**Zentralblatt MATH identifier**

06870254

**Keywords**

Causal inference propensity score multiple treatments matching observational data

#### Citation

Lopez, Michael J.; Gutman, Roee. Estimation of Causal Effects with Multiple Treatments: A Review and New Ideas. Statist. Sci. 32 (2017), no. 3, 432--454. doi:10.1214/17-STS612. https://projecteuclid.org/euclid.ss/1504253125

#### References

- Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects.
*Econometrica***74**235–267.Mathematical Reviews (MathSciNet): MR2194325

Digital Object Identifier: doi:10.1111/j.1468-0262.2006.00655.x - Abadie, A. and Imbens, G. W. (2008). On the failure of the bootstrap for matching estimators.
*Econometrica***76**1537–1557. - Armstrong, C. S., Jagolinzer, A. D. and Larcker, D. F. (2010). Chief executive officer equity incentives and accounting irregularities.
*J. Acc. Res.***48**225–271. - Austin, P. C. (2009). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples.
*Stat. Med.***28**3083–3107. - Austin, P. C. (2011). Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies.
*Pharm. Stat.***10**150–161. - Austin, P. C., Grootendorst, P. and Anderson, G. M. (2007). A comparison of the ability of different propensity score models to balance measured variables between treated and untreated subjects: A Monte Carlo study.
*Stat. Med.***26**734–753. - Austin, P. C. and Small, D. S. (2014). The use of bootstrapping when using propensity-score matching without replacement: A simulation study.
*Stat. Med.***33**4306–4319. - Bezdek, J. C., Ehrlich, R. and Full, W. (1984). FCM: The fuzzy c-means clustering algorithm.
*Comput. Geosci.***10**191–203. - Bryson, A., Dorsett, R. and Purdon, S. (2002). The use of propensity score matching in the evaluation of active labour market policies.
- Caliendo, M. and Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching.
*J. Econ. Surv.***22**31–72. - Cangul, M. Z., Chretien, Y. R., Gutman, R. and Rubin, D. B. (2009). Testing treatment effects in unconfounded studies under model misspecification: Logistic regression, discretization, and their combination.
*Stat. Med.***28**2531–2551. - Chertow, G. M., Normand, S. L. T. and McNeil, B. J. (2004). “Renalism”: Inappropriately low rates of coronary angiography in elderly individuals with renal insufficiency.
*J. Am. Soc. Nephrol.***15**2462–2468. - Crump, R. K., Hotz, V. J., Imbens, G. W. and Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects.
*Biometrika***96**187–199. - D’Agostino, R. B. (1998). Tutorial in biostatistics: Propensity score methods for bias reduction in the comparison of a treatment to a non-randomized control group.
*Stat. Med.***17**2265–2281. - Davidson, M. B., Hix, J. K., Vidt, D. G. and Brotman, D. J. (2006). Association of impaired diurnal blood pressure variation with a subsequent decline in glomerular filtration rate.
*Arch. Intern. Med.***166**846–852. - Dearing, E., McCartney, K. and Taylor, B. A. (2009). Does higher quality early child care promote low-income children’s math and reading achievement in middle childhood?
*Child Dev.***80**1329–1349. - Dehejia, R. H. and Wahba, S. (1998). Causal effects in non-experimental studies: Re-evaluating the evaluation of training programs. Technical report, National Bureau of Economic Research.
- Dehejia, R. H. and Wahba, S. (2002). Propensity score-matching methods for nonexperimental causal studies.
*Rev. Econ. Stat.***84**151–161. - Dore, D. D., Swaminathan, S., Gutman, R., Trivedi, A. N. and Mor, V. (2013). Different analyses estimate different parameters of the effect of erythropoietin stimulating agents on survival in end stage renal disease: A comparison of payment policy analysis, instrumental variables, and multiple imputation of potential outcomes.
*J. Clin. Epidemiol.***66**S42–S50. - Dorsett, R. (2006). The new deal for young people: Effect on the labour market status of young men.
*Labour Econ.***13**405–422. - Drichoutis, A. C., Lazaridis, P. and Nayga Jr., R. M. (2005). Nutrition knowledge and consumer use of nutritional food labels.
*Eur. Rev. Agricult. Econ.***32**93–118. - Efron, B. and Tibshirani, R. J. (1994).
*An Introduction to the Bootstrap*. CRC Press, Boca Raton. - Feng, P., Zhou, X.-H., Zou, Q.-M., Fan, M.-Y. and Li, X.-S. (2012). Generalized propensity score for estimating the average treatment effect of multiple treatments.
*Stat. Med.***31**681–697. - Filardo, G., Hamilton, C., Hamman, B. and Grayburn, P. (2007). Obesity and stroke after cardiac surgery: The impact of grouping body mass index.
*Ann. Thorac. Surg.***84**720–722. - Filardo, G., Hamilton, C., Hamman, B., Hebeler Jr., R. F. and Grayburn, P. A. (2009). Relation of obesity to atrial fibrillation after isolated coronary artery bypass grafting.
*Am. J. Cardiol.***103**663–666. - Frank, R., Akresh, I. R. and Lu, B. (2010). Latino immigrants and the US racial order.
*Am. Sociol. Rev.***75**378–401. - Gutman, R. and Rubin, D. B. (2013). Robust estimation of causal effects of binary treatments in unconfounded studies with dichotomous outcomes.
*Stat. Med.***32**1795–1814. - Gutman, R. and Rubin, D. B. (2015). Estimation of causal effects of binary treatments in unconfounded studies.
*Stat. Med.***34**3381–3398. - Hade, E. M. (2012). Propensity score adjustment in multiple group observational studies: Comparing matching and alternative methods. Ph.D. thesis, Ohio State University.
- Hade, E. M. and Lu, B. (2014). Bias associated with using the estimated propensity score as a regression covariate.
*Stat. Med.***33**74–87. - Hedman, L. and Van Ham, M. (2012).
*Understanding Neighbourhood Effects*:*Selection Bias and Residential Mobility*. Springer, Berlin. - Hill, J. and Reiter, J. P. (2006). Interval estimation for treatment effects using propensity score matching.
*Stat. Med.***25**2230–2256. - Holland, P. W. (1986). Statistics and causal inference.
*J. Amer. Statist. Assoc.***81**945–970. - Hott, J. R., Brunelle, N. and Myers, J. A. (2012). KD-tree algorithm for propensity score matching with three or more treatment groups. Division of Pharmacoepidemiology and Pharmacoeconomics, Technical Report Series.
- Iacus, S. M., King, G. and Porro, G. (2011). Causal inference without balance checking: Coarsened exact matching.
*Polit. Anal.*mpr013. - Imai, K. and Ratkovic, M. (2014). Covariate balancing propensity score.
*J. R. Stat. Soc. Ser. B. Stat. Methodol.***76**243–263. - Imai, K. and van Dyk, D. A. (2004). Causal inference with general treatment regimes: Generalizing the propensity score.
*J. Amer. Statist. Assoc.***99**854–866. - Imbens, G. W. (2000). The role of the propensity score in estimating dose-response functions.
*Biometrika***87**706–710. - Imbens, G. W. and Rubin, D. B. (2015).
*Causal Inference in Statistics*,*Social*,*and Biomedical Sciences*. Cambridge Univ. Press, Cambridge. - Joffe, M. M. and Rosenbaum, P. R. (1999). Invited commentary: Propensity scores.
*Am. J. Epidemiol.***150**327–333. - Johnson, R. A., Wichern, D. W. et al. (1992).
*Applied Multivariate Statistical Analysis***4**. Prentice Hall, Englewood Cliffs, NJ. - Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data.
*Statist. Sci.***22**523–539. - Karp, R. M. (1972). Reducibility among combinatorial problems. In
*Complexity of Computer Computations*(*Proc. Sympos.*,*IBM Thomas J. Watson Res. Center*,*Yorktown Heights*,*N.Y.*, 1972) 85–103. Plenum, New York. - Kilpatrick, R. D., Gilbertson, D., Brookhart, M. A., Polley, E., Rothman, K. J. and Bradbury, B. D. (2013). Exploring large weight deletion and the ability to balance confounders when using inverse probability of treatment weighting in the presence of rare treatment decisions.
*Pharmacoepidemiol. Drug Saf.***22**111–121. - Kosteas, V. D. (2010). The effect of exercise on earnings: Evidence from the NLSY.
*J. Labor Res.*1–26. - Lechner, M. (2001). Identification and estimation of causal effects of multiple treatments under the conditional independence assumption.
*Econom. Evaluation Labour Mark. Polic.*43–58. - Lechner, M. (2002). Program heterogeneity and propensity score matching: An application to the evaluation of active labor market policies.
*Rev. Econ. Stat.***84**205–220. - Lee, B. K., Lessler, J. and Stuart, E. A. (2011). Weight trimming and propensity score weighting.
*PLoS ONE***6**e18174. - Levin, I. and Alvarez, R. M. (2009). Measuring the effects of voter confidence on political participation: An application to the 2006 Mexican election. VTP Working Paper 75, Caltech/MIT Voting Technology Project.
- Little, R. J. A. (1988). Missing-data adjustments in large surveys.
*J. Bus. Econom. Statist.*287–296. - Lopez, M. J. and Gutman, R. (2014). Estimating the average treatment effects of nutritional label use using subclassification with regression adjustment.
*Stat. Methods Med. Res.*DOI:10.1177/0962280214560046. - Lu, B., Zanutto, E., Hornik, R. and Rosenbaum, P. R. (2001). Matching with doses in an observational study of a media campaign against drug abuse.
*J. Amer. Statist. Assoc.***96**1245–1253. - McCaffrey, D. F., Ridgeway, G. and Morral, A. R. (2004). Propensity score estimation with boosted regression for evaluating causal effects in observational studies.
*Psychol. Methods***9**403–425. - McCaffrey, D. F., Griffin, B. A., Almirall, D., Slaughter, M. E., Ramchand, R. and Burgette, L. F. (2013). A tutorial on propensity score estimation for multiple treatments using generalized boosted models.
*Stat. Med.***32**3388–3414. - McCullagh, P. (1980). Regression models for ordinal data.
*J. R. Stat. Soc. Ser. B. Stat. Methodol.***42**109–142. - Moore, A. W. (1991). An introductory tutorial on kd-trees. Extract from PhD thesis. Technical report.
- Quade, D. (1979). Using weighted rankings in the analysis of complete blocks with additive block effects.
*J. Amer. Statist. Assoc.***74**680–683. - R Core Team (2014).
*R*:*A Language and Environment for Statistical Computing*. R Foundation for Statistical Computing, Vienna, Austria. - Rassen, J. A., Solomon, D. H., Glynn, R. J. and Schneeweiss, S. (2011). Simultaneously assessing intended and unintended treatment effects of multiple treatment options: A pragmatic “matrix design.”
*Pharmacoepidemiol. Drug Saf.***20**675–683. - Rassen, J. A., Shelat, A. A., Franklin, J. M., Glynn, R. J., Solomon, D. H. and Schneeweiss, S. (2013). Matching by propensity score in cohort studies with three treatment groups.
*Epidemiology***24**401–409. - Robins, J. M., Hernan, M. A. and Brumback, B. (2000). Marginal structural models and causal inference in epidemiology.
*Epidemiology***11**550–560. - Rosenbaum, P. R. (1991). A characterization of optimal designs for observational studies.
*J. R. Stat. Soc.*,*B***53**597–610. - Rosenbaum, P. R. (2002).
*Observational Studies*, 2nd ed. Springer, New York. - Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects.
*Biometrika***70**41–55. - Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score.
*J. Amer. Statist. Assoc.***79**516–524. - Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score.
*Amer. Statist.***39**33–38. - Royston, P., Altman, D. G. and Sauerbrei, W. (2006). Dichotomizing continuous predictors in multiple regression: A bad idea.
*Stat. Med.***25**127–141. - Rubin, D. B. (1973). Matching to remove bias in observational studies.
*Biometrics***29**159–183. - Rubin, D. B. (1975). Bayesian inference for causality: The importance of randomization. In
*The Proceedings of the Social Statistics Section of the American Statistical Association*233–239. - Rubin, D. B. (1976). Multivariate matching methods that are equal percent bias reducing. II. Maximums on bias reduction for fixed sample sizes.
*Biometrics***32**121–132. - Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies.
*J. Amer. Statist. Assoc.***74**318–328. - Rubin, D. B. (1980). Discussion of Basu’s paper.
*J. Amer. Statist. Assoc.***75**591–593. - Rubin, D. B. (2001). Using propensity scores to help design observational studies: Application to the tobacco litigation.
*Health Serv. Outcomes Res. Methodol.***2**169–188. - Rubin, D. B. and Thomas, N. (1992a). Affinely invariant matching methods with ellipsoidal distributions.
*Ann. Statist.***20**1079–1093. - Rubin, D. B. and Thomas, N. (1992b). Characterizing the effect of matching using linear propensity score methods with normal distributions.
*Biometrika***79**797–809. - Rubin, D. B. and Thomas, N. (1996). Matching using estimated propensity scores: Relating theory to practice.
*Biometrics***52**249–264. - Sakia, R. M. (1992). The Box–Cox transformation technique: A review.
*Statistician***42**169–178. - SAS Institute Inc. (2003).
*SAS/STAT Software*. SAS Institute Inc., Cary, NC. - Schneeweiss, S., Setoguchi, S., Brookhart, A., Dormuth, C. and Wang, P. S. (2007). Risk of death associated with the use of conventional versus atypical antipsychotic drugs among elderly patients.
*CMAJ*,*Can. Med. Assoc. J.***176**627–632. - Sekhon, J. (2011). Multivariate and propensity score matching software with automated balance optimization: The matching package for R.
*J. Stat. Softw.***42**, 1–52. - Snodgrass, G., Blokland, A. A. J., Haviland, A., Nieuwbeerta, P. and Nagin, D. S. (2011). Does the time cause the crime? An examination of the relationship between time served and reoffending in the Netherlands.
*Criminology***49**1149–1194. - Splawa-Neyman, J., Dabrowska, D. M. and Speed, T. P. (1990 [1923]). On the application of probability theory to agricultural experiments. Essay on principles. Section 9.
*Statist. Sci.***5**465–472. - Spreeuwenberg, M. D., Bartak, A., Croon, M. A., Hagenaars, J. A., Busschbach, J. J. V., Andrea, H., Twisk, J. and Stijnen, T. (2010). The multiple propensity score as control for bias in the comparison of more than two treatment arms: An introduction from a case study in mental health.
*Med. Care***48**166. - Sprent, P. and Smeeton, N. C. (2007).
*Applied Nonparametric Statistical Methods*. CRC Press, Boca Raton, FL. - Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward.
*Statist. Sci.***25**1–21. - Stuart, E. A. and Rubin, D. B. (2008). Best practices in quasi-experimental designs.
*Best Pract. Quant. Methods*155–176. - Tan, Z. (2010). Bounded, efficient and doubly robust estimation with inverse weighting.
*Biometrika***97**661–682. - Tchernis, R., Horvitz-Lennon, M. and Normand, S. L. T. (2005). On the use of discrete choice models for causal inference.
*Stat. Med.***24**2197–2212. - Tu, C., Jiao, S. and Koh, W. Y. (2012). Comparison of clustering algorithms on generalized propensity score in observational studies: A simulation study.
*J. Stat. Comput. Simul.***83**2206–2218. - Vermorken, J. B., Parmar, M. K., Brady, M. F., Eisenhauer, E. A., Hogberg, T., Ozols, R. F., Rochon, J., Rustin, G. J., Sagae, S., Verheijen, R. H. et al. (2005). Clinical trials in ovarian carcinoma: Study methodology.
*Ann. Oncol.***16**viii20. - Yanovitzky, I., Zanutto, E. and Hornik, R. (2005). Estimating causal effects of public health education campaigns using propensity score methodology.
*Eval. Program Plann.***28**209–220. - Zanutto, E., Lu, B. and Hornik, R. (2005). Using propensity score subclassification for multiple treatment doses to evaluate a national antidrug media campaign.
*J. Educ. Behav. Stat.***30**59–73. - Zubizarreta, J. R. (2012). Using mixed integer programming for matching in an observational study of kidney failure after surgery.
*J. Amer. Statist. Assoc.***107**1360–1371.

### More like this

- Covariate balancing propensity score for a continuous treatment: Application to the efficacy of political advertisements

Fong, Christian, Hazlett, Chad, and Imai, Kosuke, The Annals of Applied Statistics, 2018 - Causal inference in transportation safety
studies: Comparison of potential outcomes and causal diagrams

Karwa, Vishesh, Slavković, Aleksandra B., and Donnell, Eric T., The Annals of Applied Statistics, 2011 - Matching Methods for Causal Inference: A Review and a Look Forward

Stuart, Elizabeth A., Statistical Science, 2010

- Covariate balancing propensity score for a continuous treatment: Application to the efficacy of political advertisements

Fong, Christian, Hazlett, Chad, and Imai, Kosuke, The Annals of Applied Statistics, 2018 - Causal inference in transportation safety
studies: Comparison of potential outcomes and causal diagrams

Karwa, Vishesh, Slavković, Aleksandra B., and Donnell, Eric T., The Annals of Applied Statistics, 2011 - Matching Methods for Causal Inference: A Review and a Look Forward

Stuart, Elizabeth A., Statistical Science, 2010 - Multiple Imputation: Theory and Method

Zhang, Paul, International Statistical Review, 2003 - A robust and efficient approach to causal inference based on sparse sufficient dimension reduction

Ma, Shujie, Zhu, Liping, Zhang, Zhiwei, Tsai, Chih-Ling, and Carroll, Raymond J., The Annals of Statistics, 2019 - Affinely Invariant Matching Methods with Ellipsoidal Distributions

Rubin, Donald B. and Thomas, Neal, The Annals of Statistics, 1992 - Do debit cards increase household spending? Evidence from a semiparametric causal analysis of a survey

Mercatanti, Andrea and Li, Fan, The Annals of Applied Statistics, 2014 - For objective causal inference, design trumps
analysis

Rubin, Donald B., The Annals of Applied Statistics, 2008 - The sensitivity of linear regression coefficients’ confidence limits to the omission of a confounder

Hosman, Carrie A., Hansen, Ben B., and Holland, Paul W., The Annals of Applied Statistics, 2010 - Estimating the Causal Effects of Marketing Interventions Using Propensity Score Methodology

Rubin, Donald B. and Waterman, Richard P., Statistical Science, 2006