The Annals of Applied Statistics

Optimal multilevel matching using network flows: An application to a summer reading intervention

Samuel D. Pimentel, Lindsay C. Page, Matthew Lenard, and Luke Keele

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Many observational studies of causal effects occur in settings with clustered treatment assignment. In studies of this type, treatment is applied to entire clusters of units. For example, an educational intervention might be administered to all the students in a school. We develop a matching algorithm for multilevel data based on a network flow algorithm. Earlier work on multilevel matching relied on integer programming, which allows for balance targeting on specific covariates but can be slow with larger data sets. Although we cannot directly specify minimal levels of balance for individual covariates, our algorithm is fast and scales easily to larger data sets. We apply this algorithm to assess a school-based intervention through which students in treated schools were exposed to a new reading program during summer school. In one variant of the algorithm, where we match both schools and students, we change the causal estimand through optimal subset matching to better maintain common support. In a second variant, we relax the common support assumption to preserve the causal estimand by only matching on schools. We find that the summer intervention does not appear to increase reading test scores. In a sensitivity analysis, however, we determine that an unobserved confounder could easily mask a larger treatment effect.

Article information

Ann. Appl. Stat., Volume 12, Number 3 (2018), 1479-1505.

Received: November 2016
Revised: October 2017
First available in Project Euclid: 11 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Causal inference hierarchical/multilevel data observational study optimal matching


Pimentel, Samuel D.; Page, Lindsay C.; Lenard, Matthew; Keele, Luke. Optimal multilevel matching using network flows: An application to a summer reading intervention. Ann. Appl. Stat. 12 (2018), no. 3, 1479--1505. doi:10.1214/17-AOAS1118.

Export citation


  • Arpino, B. and Mealli, F. (2011). The specification of the propensity score in multilevel observational studies. Comput. Statist. Data Anal. 55 1770–1780.
  • Barnow, B. S., Cain, G. G. and Goldberger, A. S. (1980). Issues in the analysis of selectivity bias. In Evaluation Studies (E. Stromsdorfer and G. Farkas, eds.) 5 43–59. Sage, San Francisco, CA.
  • Borman, G. D., Benson, J. and Overman, L. T. (2005). Families, schools, and summer learning. Elem. Sch. J. 106 131–150.
  • Borman, G. D. and Dowling, N. M. (2006). Longitudinal achievement effects of multiyear summer school: Evidence from the teach Baltimore randomized field trial. Educ. Eval. Policy Anal. 28 25–48.
  • Cochran, W. G. (1965). The planning of observational studies of human populations. J. Roy. Statist. Soc. Ser. A 128 234–265.
  • Cochran, W. G. and Rubin, D. B. (1973). Controlling bias in observational studies. Sankhya Ser. A 35 417–446.
  • Cooper, H., Nye, B., Charlton, K., Lindsay, J. and Greathouse, S. (1996). The effects of summer vacation on achievement test scores: A narrative and meta-analytic review. Rev. Educ. Res. 66 227–268.
  • Cooper, H., Charlton, K., Valentine, J. C., Muhlenbruck, L. and Borman, G. D. (2000). Making the most of summer school: A meta-analytic and narrative review. Monogr. Soc. Res. Child Dev. 65 1–127.
  • Corp, C. (2015). myON: A complete digital literacy program. Available at
  • Crump, R. K., Hotz, V. J., Imbens, G. W. and Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika 96 187–199.
  • Entwisle, D. R. and Alexander, K. L. (1992). Summer setback: Race, poverty, school composition, and mathematics achievement in the first two years of school. Am. Sociol. Rev. 57 72–84.
  • Hansen, B. B., Rosenbaum, P. R. and Small, D. S. (2014). Clustered treatment assignments and sensitivity to unmeasured biases in observational studies. J. Amer. Statist. Assoc. 109 133–144.
  • Hodges, J. L. Jr. and Lehmann, E. L. (1963). Estimates of location based on rank tests. Ann. Math. Stat. 34 598–611.
  • Hong, G. and Raudenbush, S. W. (2006). Evaluating kindergarten retention policy: A case study of causal inference for multilevel observational data. J. Amer. Statist. Assoc. 101 901–910.
  • Lehmann, E. L. and Romano, J. P. (2005). Testing Statistical Hypotheses, 3rd ed. Springer, New York.
  • Li, F., Zaslavsky, A. M. and Landrum, M. B. (2013). Propensity score weighting with multilevel data. Stat. Med. 32 3373–3387.
  • Page, L. C. and Scott-Clayton, J. (2016). Improving college access in the United States: Barriers and policy responses. Econ. Educ. Rev. 51 4–22.
  • Pimentel, S. D. and Kelz, R. (2017). Optimal tradeoffs in matching designs for observational studies. Unpublished manuscript.
  • Pimentel, S. D., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2015). Large, sparse optimal matching with refined covariate balance in an observational study of the health outcomes produced by new surgeons. J. Amer. Statist. Assoc. 110 515–527.
  • Quinn, D. M. (2015). Black–white summer learning gaps interpreting the variability of estimates across representations. Educ. Eval. Policy Anal. 37 50–69.
  • Rambo-Hernandez, K. E. and McCoach, D. B. (2015). High-achieving and average students’ reading growth: Contrasting school and summer trajectories. J. Educ. Res. 108 112–129.
  • Rosenbaum, P. R. (1989). Optimal matching for observational studies. J. Amer. Statist. Assoc. 84 1024–1032.
  • Rosenbaum, P. R. (2002). Observational Studies, 2nd ed. Springer, New York.
  • Rosenbaum, P. R. (2003). Exact confidence intervals for nonconstant effects by inverting the signed rank test. Amer. Statist. 57 132–138.
  • Rosenbaum, P. R. (2008). Testing hypotheses in order. Biometrika 95 248–252.
  • Rosenbaum, P. R. (2010). Design of Observational Studies. Springer, New York.
  • Rosenbaum, P. R. (2012a). Optimal matching of an optimally chosen subset in observational studies. J. Comput. Graph. Statist. 21 57–71.
  • Rosenbaum, P. R. (2012b). Testing one hypothesis twice in observational studies. Biometrika 99 763–774.
  • Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods. Amer. Statist. 39 33–38.
  • Rosenbaum, P. R. and Silber, J. H. (2009). Sensitivity analysis for equivalence and difference in an observational study of neonatal intensive care units. J. Amer. Statist. Assoc. 104 501–511.
  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 6 688–701.
  • Rubin, D. B. (2008). For objective causal inference, design trumps analysis. Ann. Appl. Stat. 2 808–804.
  • Silber, J. H., Rosenbaum, P. R., Trudeau, M. E., Even-Shoshan, O., Chen, W., Zhang, X. and Mosher, R. E. (2001). Multivariate matching and bias reduction in the surgical outcomes study. Med. Care 39 1048–1064.
  • Skibbe, L. E., Grimm, K. J., Bowles, R. P. and Morrison, F. J. (2012). Literacy growth in the academic year versus summer from preschool through second grade: Differential effects of schooling across four skills. Sci. Stud. Read. 16 141–165.
  • Splawa-Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 465–472. Translated from the Polish and edited by D. M. Dabrowska and T. P. Speed.
  • Traskin, M. and Small, D. S. (2011). Defining the study population for an observational study to ensure sufficient overlap: A tree approach. Stat. Biosci. 3 94–118.
  • Wirt, J., Choy, S., Gruner, A., Sable, J., Tobin, R., Bae, Y., Sexton, J., Stennett, J., Watanabe, S., Zill, N. et al. (2000). The Condition of Education, 2000. ERIC, Washington, DC.
  • Yang, D., Small, D. S., Silber, J. H. and Rosenbaum, P. R. (2012). Optimal matching with minimal deviation from fine balance in a study of obesity and surgical outcomes. Biometrics 68 628–636.
  • Zubizarreta, J. R. (2012). Using mixed integer programming for matching in an observational study of kidney failure after surgery. J. Amer. Statist. Assoc. 107 1360–1371.
  • Zubizarreta, J. R. and Keele, L. (2017a). Optimal multilevel matching in clustered observational studies: A case study of the effectiveness of private schools under a large-scale voucher system. J. Amer. Statist. Assoc. 112 547–560.
  • Zubizarreta, J. R. and Keele, L. (2017b). Optimal multilevel matching in clustered observational studies: A case study of the effectiveness of private schools under a large-scale voucher system. J. Amer. Statist. Assoc. 112 547–560.
  • Zubizarreta, J. R., Paredes, R. D. and Rosenbaum, P. R. (2014). Matching for balance, pairing for heterogeneity in an observational study of the effectiveness of for-profit and not-for-profit high schools in Chile. Ann. Appl. Stat. 8 204–231.
  • Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2011). Matching for several sparse nominal variables in a case-control study of readmission following surgery. Amer. Statist. 65 229–238.
  • Zvoch, K. and Stevens, J. J. (2015). Identification of summer school effects by comparing the in-and out-of-school growth rates of struggling early readers. Elem. Sch. J. 115 433–456.