The Annals of Applied Statistics

Estimating treatment effect heterogeneity in randomized program evaluation

Kosuke Imai and Marc Ratkovic

Full-text: Open access


When evaluating the efficacy of social programs and medical treatments using randomized experiments, the estimated overall average causal effect alone is often of limited value and the researchers must investigate when the treatments do and do not work. Indeed, the estimation of treatment effect heterogeneity plays an essential role in (1) selecting the most effective treatment from a large number of available treatments, (2) ascertaining subpopulations for which a treatment is effective or harmful, (3) designing individualized optimal treatment regimes, (4) testing for the existence or lack of heterogeneous treatment effects, and (5) generalizing causal effect estimates obtained from an experimental sample to a target population. In this paper, we formulate the estimation of heterogeneous treatment effects as a variable selection problem. We propose a method that adapts the Support Vector Machine classifier by placing separate sparsity constraints over the pre-treatment parameters and causal heterogeneity parameters of interest. The proposed method is motivated by and applied to two well-known randomized evaluation studies in the social sciences. Our method selects the most effective voter mobilization strategies from a large number of alternative strategies, and it also identifies the characteristics of workers who greatly benefit from (or are negatively affected by) a job training program. In our simulation studies, we find that the proposed method often outperforms some commonly used alternatives.

Article information

Ann. Appl. Stat., Volume 7, Number 1 (2013), 443-470.

First available in Project Euclid: 9 April 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Causal inference individualized treatment rules LASSO moderation variable selection


Imai, Kosuke; Ratkovic, Marc. Estimating treatment effect heterogeneity in randomized program evaluation. Ann. Appl. Stat. 7 (2013), no. 1, 443--470. doi:10.1214/12-AOAS593.

Export citation


  • Bradley, P. and Mangasarian, O. L. (1998). Feature selection via concave minimization and support vector machines. In Machine Learning Proceedings of the Fifteenth International Conference 82–90. Morgan Kaufmann, San Francisco, CA.
  • Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA.
  • Cai, T., Tian, L., Wong, P. H. and Wei, L. J. (2011). Analysis of randomized comparative clinical trial data for personalized treatment selections. Biostatistics 12 270–282.
  • Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). Bart: Bayesian additive regression trees. Ann. Appl. Stat. 4 266–298.
  • Cole, S. R. and Stuart, E. A. (2010). Generalizing evidence from randomized clinical trials to target populations: The ACTG 320 trial. Am. J. Epidemiol. 172 107–115.
  • Crump, R. K., Hotz, V. J., Imbens, G. W. and Mitnik, O. A. (2008). Nonparametric tests for treatment effect heterogeneity. The Review of Economics and Statistics 90 389–405.
  • Davison, A. C. (1992). Treatment effect heterogeneity in paired data. Biometrika 79 463–474.
  • Dehejia, R. H. and Wahba, S. (1999). Causal effects in nonexperimental studies: Reevaluating the evaluation of training programs. J. Amer. Statist. Assoc. 94 1053–1062.
  • Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • Franc, V., Zien, A. and Schölkopf, B. (2011). Support vector machines as probabilistic models. In The 28th International Conference on Machine Learning 665–672. ACM, Bellevue, WA.
  • Frangakis, C. (2009). The calibration of treatment effects from clinical trials to target populations. Clin. Trials 6 136–140.
  • Freund, Y. and Schapire, R. E. (1999). A short introduction to boosting. In Proceedings of the Sixteenth International Joint Conference on Artificial Intelligence 1401–1406. Morgan Kaufmann, San Francisco, CA.
  • Friedman, J. H., Hastie, T. and Tibshirani, R. (2010). Regularization paths for generalized linear models via coordinate descent. Journal of Statistical Software 33 1–22.
  • Gail, M. and Simon, R. (1985). Testing for qualitative interactions between treatment effects and patient subsets. Biometrics 41 361–372.
  • Gelman, A., Jakulin, A., Pittau, M. G. and Su, Y.-S. (2008). A weakly informative default prior distribution for logistic and other regression models. Ann. Appl. Stat. 2 1360–1383.
  • Gerber, A. S. and Green, D. P. (2000). The effects of canvassing, telephone calls, and direct mail on voter turnout: A field experiment. American Political Science Review 94 653–663.
  • Gerber, A., Green, D. and Larimer, C. (2008). Social pressure and voter turnout: Evidence from a large-scale field experiment. American Political Science Review 102 33–48.
  • Green, D. P. and Kern, H. L. (2010a). Detecting heterogenous treatment effects in large-scale experiments using Bayesian additive regression trees. In The Annual Summer Meeting of the Society of Political Methodology. Univ. Iowa.
  • Green, D. P. and Kern, H. L. (2010b). Generalizing experimental results. In The Annual Meeting of the American Political Science Association. Washington, D.C.
  • Gunter, L., Zhu, J. and Murphy, S. A. (2011). Variable selection for qualitative interactions. Stat. Methodol. 8 42–55.
  • Hartman, E., Grieve, R. and Sekhon, J. S. (2010). From SATE to PATT: The essential role of placebo test combining experimental and observational studies. In The Annual Meeting of the American Political Science Association. Washington, D.C.
  • Hill, J. L. (2011). Challenges with propensity score matching in a high-dimensional setting and a potential alternative. Multivariate and Behavioral Research 46 477–513.
  • Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651–674.
  • Imai, K. (2005). Do get-out-the-vote calls reduce turnout?: The importance of statistical methods for field experiments. American Political Science Review 99 283–300.
  • Imai, K. and Strauss, A. (2011). Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Analysis 19 1–19.
  • Kang, J., Su, X., Hitsman, B., Liu, K. and Lloyd-Jones, D. (2012). Tree-structured analysis of treatment effects with large observational data. J. Appl. Stat. 39 513–529.
  • Lagakos, S. W. (2006). The challenge of subgroup analyses–reporting without distorting. N. Engl. J. Med. 354 1667–1669.
  • LaLonde, R. J. (1986). Evaluating the econometric evaluations of training programs with experimental data. American Economic Review 76 604–620.
  • LeBlanc, M. and Kooperberg, C. (2010). Boosting predictions of treatment success. Proc. Natl. Acad. Sci. USA 107 13559–13560.
  • Lee, Y., Lin, Y. and Wahba, G. (2004). Multicategory support vector machines: Theory and application to the classification of microarray data and satellite radiance data. J. Amer. Statist. Assoc. 99 67–81.
  • Lin, Y. (2002). Support vector machines and the Bayes rule in classification. Data Min. Knowl. Discov. 6 259–275.
  • Lipkovich, I., Dmitrienko, A., Denne, J. and Enas, G. (2011). Subgroup identification based on differential effect search—a recursive partitioning method for establishing response to treatment in patient subpopulations. Stat. Med. 30 2601–2621.
  • Loh, W. Y., Piper, M. E., Schlam, T. R., Fiore, M. C., Smith, S. S., Jorenby, D. E., Cook, J. W., Bolt, D. M. and Baker, T. B. (2012). Should all smokers use combination smoking cessation pharmacotherapy? Using novel analytic methods to detect differential treatment effects over eight weeks of pharmacotherapy. Nicotine and Tobacco Research 14 131–141.
  • Manski, C. F. (2004). Statistical treatment rules for heterogeneous populations. Econometrica 72 1221–1246.
  • Menon, A. K., Jiang, X., Vembu, S., Elkan, C. and Ohno-Machado, L. (2012). Predicting accurate probabilities with a ranking loss. In Proceedings of the 29th International Conference on Machine Learning. Edinburgh, Scotland.
  • Moodie, E. E. M., Platt, R. W. and Kramer, M. S. (2009). Estimating response-maximized decision rules with applications to breastfeeding. J. Amer. Statist. Assoc. 104 155–165.
  • Murphy, S. A. (2003). Optimal dynamic treatment regimes. J. R. Stat. Soc. Ser. B Stat. Methodol. 65 331–366.
  • Nickerson, D. W. (2008). Is voting contagious?: Evidence from two field experiments. American Political Science Review 102 49–57.
  • Pineau, J., Bellemare, M. G., Rush, A. J., Ghizaru, A. and Murphy, S. A. (2007). Constructing evidence-based treatment strategies using methods from computer science. Drug and Alcohol Dependence 88S S52–S60.
  • Platt, J. (1999). Probabilistic outputs for support vector machines and comparison to regularized likelihood methods. In Advances in Large Margin Classifiers 61–74. MIT Press, Cambridge, MA.
  • Qian, M. and Murphy, S. A. (2011). Performance guarantees for individualized treatment rules. Ann. Statist. 39 1180–1210.
  • Ratkovic, M. and Imai, K. (2012). FindIt: R package for finding heterogeneous treatment effects. Available at Comprehensive R Archive Network (
  • Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Rothwell, P. M. (2005). Subgroup analysis in randomized controlled trials: Importance, indications, and interpretation. The Lancet 365 176–186.
  • Rubin, D. B. (1990). Comment on J. Neyman and causal inference in experiments and observational studies: “On the application of probability theory to agricultural experiments. Essay on principles. Section 9” [Ann. Agric. Sci. 10 (1923) 1–51]. Statist. Sci. 5 472–480.
  • Sollich, P. (2002). Bayesian methods for support vector machines: Evidence and predictive class probabilities. Machine Learning 46 21–52.
  • Stuart, E. A., Cole, S. R., Bradshaw, C. P. and Leaf, P. J. (2011). The use of propensity scores to assess the generalizability of results from randomized trials. J. Roy. Statist. Soc. Ser. A 174 369–386.
  • Su, X., Tsai, C. L., Wang, H., Nickerson, D. M. and Li, B. (2009). Subgroup analysis via recursive partitioning. J. Mach. Learn. Res. 10 141–158.
  • Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. J. Roy. Statist. Soc. Ser. B 58 267–288.
  • Vapnik, V. N. (1995). The Nature of Statistical Learning Theory. Springer, New York.
  • Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA.
  • Wahba, G. (2002). Soft and hard classification by reproducing kernel Hilbert space methods. Proc. Natl. Acad. Sci. USA 99 16524–16530 (electronic).
  • Yang, Y. and Zou, H. (2012). An efficient algorithm for computing the HHSVM and its generalizations. J. Comput. Graph. Statist. To appear.
  • Yuan, M. and Lin, Y. (2006). Model selection and estimation in regression with grouped variables. J. R. Stat. Soc. Ser. B Stat. Methodol. 68 49–67.
  • Zhang, T. (2004). Statistical behavior and consistency of classification methods based on convex risk minimization. Ann. Statist. 32 56–85.
  • Zhang, H. H. (2006). Variable selection for support vector machines via smoothing spline ANOVA. Statist. Sinica 16 659–674.
  • Zhang, B., Tsiatis, A. A., Laber, E. B. and Davidian, M. (2012). A robust method for estimating optimal treatment regimes. Biometrics. To appear.
  • Zhao, Y., Zeng, D., Socinski, M. A. and Kosorok, M. R. (2011). Reinforcement learning strategies for clinical trials in nonsmall cell lung cancer. Biometrics 67 1422–1433.
  • Zhao, Y., Zeng, D., Rush, J. A. and Kosorok, M. R. (2012). Estimating individualized treatment rules using outcome weighted learning. J. Amer. Statist. Assoc. 107 1106–1118.
  • Zou, H. and Hastie, T. (2005). Regularization and variable selection via the elastic net. J. R. Stat. Soc. Ser. B Stat. Methodol. 67 301–320.
  • Zou, H., Hastie, T. and Tibshirani, R. (2007). On the “degrees of freedom” of the lasso. Ann. Statist. 35 2173–2192.