The Annals of Applied Statistics

Influencing elections with statistics: Targeting voters with logistic regression trees

Thomas Rusch, Ilro Lee, Kurt Hornik, Wolfgang Jank, and Achim Zeileis

Full-text: Open access


In political campaigning substantial resources are spent on voter mobilization, that is, on identifying and influencing as many people as possible to vote. Campaigns use statistical tools for deciding whom to target (“microtargeting”). In this paper we describe a nonpartisan campaign that aims at increasing overall turnout using the example of the 2004 US presidential election. Based on a real data set of 19,634 eligible voters from Ohio, we introduce a modern statistical framework well suited for carrying out the main tasks of voter targeting in a single sweep: predicting an individual’s turnout (or support) likelihood for a particular cause, party or candidate as well as data-driven voter segmentation. Our framework, which we refer to as LORET (for LOgistic REgression Trees), contains standard methods such as logistic regression and classification trees as special cases and allows for a synthesis of both techniques. For our case study, we explore various LORET models with different regressors in the logistic model components and different partitioning variables in the tree components; we analyze them in terms of their predictive accuracy and compare the effect of using the full set of available variables against using only a limited amount of information. We find that augmenting a standard set of variables (such as age and voting history) with additional predictor variables (such as the household composition in terms of party affiliation) clearly improves predictive accuracy. We also find that LORET models based on tree induction beat the unpartitioned models. Furthermore, we illustrate how voter segmentation arises from our framework and discuss the resulting profiles from a targeting point of view.

Article information

Ann. Appl. Stat., Volume 7, Number 3 (2013), 1612-1639.

First available in Project Euclid: 3 October 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Campaigning classification tree get-out-the-vote model tree political marketing voter identification voter segmentation voter profile microtargeting


Rusch, Thomas; Lee, Ilro; Hornik, Kurt; Jank, Wolfgang; Zeileis, Achim. Influencing elections with statistics: Targeting voters with logistic regression trees. Ann. Appl. Stat. 7 (2013), no. 3, 1612--1639. doi:10.1214/13-AOAS648.

Export citation


  • Albert, A. and Anderson, J. A. (1984). On the existence of maximum likelihood estimates in logistic regression models. Biometrika 71 1–10.
  • Arceneaux, K. and Nickerson, D. W. (2009). Who is mobilized to vote? A re-analysis of 11 field experiments. American Journal of Political Science 53 1–16.
  • Blumenthal, M. (2012). Obama campaign polls: How the internal data got it right. Huffington Post. Available at [accessed 2012-12-09].
  • Breiman, L., Friedman, J. H., Olsen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth, Pacific Grove, CA.
  • Cardy, E. A. (2005). An experimental field study and persuasion effects of partisan direct mail and phone calls. Annals of the American Academy of Political and Social Science 601 28–40.
  • Chan, K.-Y. and Loh, W.-Y. (2004). LOTUS: An algorithm for building accurate and comprehensible logistic regression trees. J. Comput. Graph. Statist. 13 826–852.
  • Chaudhuri, P., Lo, W. D., Loh, W.-Y. and Yang, C. C. (1995). Generalized regression trees. Statist. Sinica 5 641–666.
  • Chipman, H. A., George, E. I. and McCulloch, R. E. (2010). BART: Bayesian additive regression trees. Ann. Appl. Stat. 4 266–298.
  • Cutts, D. and Fieldhouse, E. (2009). What small spatial scales are relevant as electoral contexts for individual voters? The importance of the household on turnout at the 2001 general election. American Journal of Political Science 53 726–739.
  • Denny, K. and Doyle, O. (2009). Does voting history matter? Analysing persistence in turnout. American Journal of Political Science 53 17–35.
  • Finkel, S. (1993). Reexamining the “Minimal effects” model in recent presidential elections. Journal of Politics 55 1–21.
  • Gerber, A. S. and Green, D. P. (2000a). The effect of a nonpartisan get-out-the-vote drive: An experimental study of leafleting. Journal of Politics 62 846–857.
  • Gerber, A. S. and Green, D. P. (2000b). The effects of canvassing, telephone calls, and direct mail on voter turnout: A field experiment. American Political Science Review 94 656–664.
  • Gerber, A. S., Green, D. P. and Green, M. (2007). Partisan mail and voter turnout: Results from randomized field experiments. Electoral Studies 22 563–579.
  • Goldstein, K. and Ridout, T. N. (2002). The politics of participation: Mobilization and turnout over time. Political Behavior 24 3–29.
  • Green, D. P., Gerber, A. S. and Nickerson, D. W. (2003). Getting out the vote in local elections: Results from six door-to-door canvassing experiments. Journal of Politics 65 1083–1096.
  • Green, D. P. and Gerber, A. S. (2008). Get Out the Vote: How to Increase Voter Turnout, 2nd ed. Brookings Institution, Washington DC.
  • Green, D. P. and Kern, H. L. (2012). Modeling heterogeneous treatment effects in survey experiments with Bayesian additive regression trees. Public Opinion Quarterly 76 491–511.
  • Hansen, B. B. and Bowers, J. (2009). Attributing effects to a cluster-randomized get-out-the-vote campaign. J. Amer. Statist. Assoc. 104 873–885.
  • Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
  • Holbrook, T. M. and McClurg, S. D. (2005). The mobilization of core supporters: Campaigns, turnout and electoral composition in United States presidential elections. American Journal of Political Science 49 689–703.
  • Hothorn, T., Bretz, F. and Westfall, P. (2008). Simultaneous inference in general parametric models. Biom. J. 50 346–363.
  • Hothorn, T., Hornik, K. and Zeileis, A. (2006). Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Statist. 15 651–674.
  • Hothorn, T., Leisch, F., Zeileis, A. and Hornik, K. (2005). The design and analysis of benchmark experiments. J. Comput. Graph. Statist. 14 675–699.
  • Imai, K. and Strauss, A. (2011). Estimation of heterogeneous treatment effects from randomized experiments, with application to the optimal planning of the get-out-the-vote campaign. Political Analysis 19 1–19.
  • Issenberg, S. (2012a). Obama’s white whale: How the campaign’s top-secret project Narwhal could change this race, and many to come. Slate. Available at [accessed 2012-12-04].
  • Issenberg, S. (2012b). The Victory Lab: The Secret Science of Winning Campaigns. Crown Publishers, New York.
  • Karp, J. A. and Banducci, S. A. (2007). Party mobilization and political participation in new and old democracies. Party Politics 13 217–234.
  • Karp, J. A., Banducci, S. A. and Bowler, S. (2008). Getting out the vote: Party mobilization in a comparative perspective. British Journal of Political Science 38 91–112.
  • Kass, G. V. (1980). An exploratory technique for investigating large quantities of categorical data. J. R. Stat. Soc. Ser. C. Appl. Stat. 29 119–127.
  • Landwehr, N., Hall, M. and Eibe, F. (2005). Logistic model trees. Machine Learning 59 161–205.
  • Loh, W.-Y. and Shih, Y.-S. (1997). Split selection methods for classification trees. Statist. Sinica 7 815–840.
  • Malchow, H. (2008). Political Targeting, 2nd ed. Predicted Lists, LLC, Sacramento, CA.
  • McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall, New York.
  • McDonald, M. (2012). Turnout rates, 1980–2010. United States Election Project. Available at [accessed 2012-02-16].
  • Muller, M. G. (1999). Electoral campaigning as an occupation—The professionalization of political consultants in the United States. Politische Vierteljahresschrift 40 198–199.
  • Murray, G. R. and Scime, A. (2010). Microtargeting and electorate segmentation: Data mining the American national election studies. Journal of Political Marketing 9 143–166.
  • Nickerson, D. W., Friedrichs, R. D. and King, D. C. (2006). Partisan mobilization campaigns in the field: Results from a statewide turnout experiment in Michigan. Political Research Quarterly 59 85–97.
  • Panagopoulos, C. (2009). Partisan and nonpartisan message content and voter mobilization field experimental evidence. Political Research Quarterly 62 70–76.
  • Parry, J., Barth, J., Kropf, M. and Jones, E. T. (2008). Mobilizing the seldom voter: Campaign contact and effects in high-profile elections. Political Behavior 30 97–113.
  • Phillips, J. M., Urbany, J. E. and Reynolds, T. J. (2008). Confirmation and the effects of valenced political advertising: A field experiment. Journal of Consumer Research 34 794–806.
  • Plasser, F. (2000). American campaign techniques worldwide. Harvard International Journal of Press-Politics 5 33–54.
  • Quinlan, J. R. (1993). C4.5: Programs for Machine Learning. Morgan Kaufmann, San Mateo, CA.
  • R Development Core Team (2012). R: A language and environment for statistical computing. R Foundation for Statstical Computing, Vienna, Austria.
  • Rusch, T. and Zeileis, A. (2013). Gaining insight with recursive partitioning of generalized linear models. J. Stat. Comput. Simul. 83 1301–1315.
  • Rusch, T., Zeileis, A., Hothorn, T. and Leisch, F. (2012). mobtools: A collection of statmodels and of utilities for extending mob. R package version 0.0-1.
  • Rusch, T., Lee, I., Hornik, K., Jank, W. and Zeileis, A. (2013a). Supplement to “Influencing elections with statistics: Targeting voters with logistic regression trees.” DOI:10.1214/13-AOAS648SUPPA.
  • Rusch, T., Lee, I., Hornik, K., Jank, W. and Zeileis, A. (2013b). Supplement to “Influencing elections with statistics: Targeting voters with logistic regression trees.” DOI:10.1214/13-AOAS648SUPPB.
  • Sing, T., Sander, O., Beerenwinkel, N. and Lengauer, T. (2005). ROCR: Visualizing classifier performance in R. Bioinformatics 21 3940–3941.
  • Sing, T., Sander, O., Beerenwinkel, N. and Lengauer, T. (2009). ROCR: Visualizing the performance of scoring classifiers. R package version 1.0-4.
  • Susan, J. C. (1999). The disempowerment of the gender gap: Soccer moms and the 1996 elections. PS: Political Science & Politics 32 7–11.
  • Sussman, G. and Galizio, L. (2003). The global reproduction of American politics. Political Comunication 20 309–328.
  • Therneau, T. M. and Atkinson, E. J. (1997). An introduction to recursive partitioning using the rpart routine. Technical Report 61, Section of Biostatistics, Mayo Clinic, Rochester, NY.
  • Therneau, T. M., Atkinson, E. J. and Ripley, B. D. (2011). rpart: Recursive partitioning. R package version 3.1-50.
  • US Election Assistance Commission (2010). The Impact of the National Voter Registration Act of 1993 on the Administration of Elections for Federal Office 2009–2010.
  • Whitelock, A., Whitelock, J. and van Heerde, J. (2010). The influence of promotional activity and different electoral systems on voter turnout: A study of the UK and German Euro elections. European Journal of Marketing 44 401–420.
  • Wielhouwer, P. W. (2003). In search of Lincoln’s perfect list—Targeting in grassroots campaigns. American Politics Research 31 632–669.
  • Wikipedia (2012). United States presidential election, 2012. Available at,_2012 [accessed 2012-11-21].
  • Wilcoxon, F. (1945). Individual comparisons by ranking methods. Biometrics Bulletin 1 80–83.
  • Zeileis, A., Hothorn, T. and Hornik, K. (2008). Model-based recursive partitioning. J. Comput. Graph. Statist. 17 492–514.
  • Zhang, H. and Singer, B. H. (2010). Recursive Partitioning and Applications, 2nd ed. Springer, New York.

Supplemental materials

  • Supplementary material A: Data and Code. A bundle containing the code used to produce the results of the paper and a snapshot of the data set. Unfortunately we are not at liberty to share the whole original data set, but were allowed to include an anonymized, random sample ($N=6544$) of the data.
  • Supplementary material B: Rejoinder. A rejoinder containing additional analyses of LORET models with a historic proxy variable and a comparison of LORET models to high-performance methods like Support Vector Machines, Bayesian Additive Regression Trees, Artificial Neural Networks, Logistic Model Trees and Random Forests.