The Annals of Applied Statistics

Matching for balance, pairing for heterogeneity in an observational study of the effectiveness of for-profit and not-for-profit high schools in Chile

José R. Zubizarreta, Ricardo D. Paredes, and Paul R. Rosenbaum

Full-text: Open access


Conventionally, the construction of a pair-matched sample selects treated and control units and pairs them in a single step with a view to balancing observed covariates $\mathbf{x}$ and reducing the heterogeneity or dispersion of treated-minus-control response differences, $Y$. In contrast, the method of cardinality matching developed here first selects the maximum number of units subject to covariate balance constraints and, with a balanced sample for $\mathbf{x}$ in hand, then separately pairs the units to minimize heterogeneity in $Y$. Reduced heterogeneity of pair differences in responses $Y$ is known to reduce sensitivity to unmeasured biases, so one might hope that cardinality matching would succeed at both tasks, balancing $\mathbf{x}$, stabilizing $Y$. We use cardinality matching in an observational study of the effectiveness of for-profit and not-for-profit private high schools in Chile—a controversial subject in Chile—focusing on students who were in government run primary schools in 2004 but then switched to private high schools. By pairing to minimize heterogeneity in a cardinality match that has balanced covariates, a meaningful reduction in sensitivity to unmeasured biases is obtained.

Article information

Ann. Appl. Stat., Volume 8, Number 1 (2014), 204-231.

First available in Project Euclid: 8 April 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Design sensitivity integer programming testing twice


Zubizarreta, José R.; Paredes, Ricardo D.; Rosenbaum, Paul R. Matching for balance, pairing for heterogeneity in an observational study of the effectiveness of for-profit and not-for-profit high schools in Chile. Ann. Appl. Stat. 8 (2014), no. 1, 204--231. doi:10.1214/13-AOAS713.

Export citation


  • Angrist, J. D., Pathak, P. A. and Walters, C. R. (2013). Explaining charter school effectiveness. Am. Econ. J. 5 1–27.
  • Baiocchi, M. (2011). Designing robust studies using propensity score and prognostic score matching. Chapter 3 in Methodologies for Observational Studies of Health Care Policy. Ph.D. thesis, Dept. Statistics, The Wharton School, Univ. Pennsylvania, Philadelphia, PA.
  • Bellei, C. (2009). Does lengthening the school day increase students academic achievement? Results from a natural experiment in Chile. Econ. Educ. Rev. 28 629–640.
  • Brown, B. M. (1981). Symmetric quantile averages and related estimators. Biometrika 68 235–242.
  • Cornfield, J., Haenszel, W., Hammond, E., Lilienfeld, A., Shimkin, M. and Wynder, E. (1959). Smoking and lung cancer. J. Natl. Cancer Inst. 22 173–203.
  • Cox, D. R. (1958). Planning of Experiments. A Wiley Publication in Applied Statistics. Wiley, New York.
  • Crump, R. K., Hotz, V. J., Imbens, G. W. and Mitnik, O. A. (2009). Dealing with limited overlap in estimation of average treatment effects. Biometrika 96 187–199.
  • Deyo, R. A., Cherkin, D. C. and Ciol, M. A. (1992). Adapting a clinical comorbidity index for use with ICD-9-CM administrative databases. J. Clin. Epidemiol. 45 613–619.
  • Elacqua, G. (2009). The Impact of School Choice and Public Policy on Segregation: Evidence from Chile. Centro de Políticas Comparadas de Educación, Univ. Diego Portales, Santiago, Chile.
  • Fisher, R. A. (1935). The Design of Experiments. Oliver & Boyd, Edinburgh.
  • Gastwirth, J. L. (1992). Methods for assessing the sensitivity of statistical comparisons used in Title VII cases to omitted variables. Jurimetrics 33 19–34.
  • Hansen, B. B. (2007). Optmatch: Flexible, optimal matching for observational studies. R News 7 18–24. (Package optmatch in R).
  • Hansen, B. B. (2008). The prognostic analogue of the propensity score. Biometrika 95 481–488.
  • Hill, J. and Su, Y.-S. (2013). Assessing lack of common support in causal inference using Bayesian nonparametrics: Implications for evaluating the effect of breastfeeding on children’s cognitive outcomes. Ann. Appl. Stat. 7 1386–1420.
  • Hodges, J. L. Jr. and Lehmann, E. L. (1963). Estimates of location based on rank tests. Ann. Math. Statist. 34 598–611.
  • Hosman, C. A., Hansen, B. B. and Holland, P. W. (2010). The sensitivity of linear regression coefficients’ confidence limits to the omission of a confounder. Ann. Appl. Stat. 4 849–870.
  • Iacus, S. M., King, G. and Porro, G. (2009). Software for coarsened exact matching. J. Stat. Softw. 30 1–27.
  • Knaus, W. A., Draper, E. A., Wagner, D. P. and Zimmerman, J. E. (1985). APACHE II: A severity of disease classification system. Crit. Care Med. 13 818–829.
  • Lehmann, E. L. (1975). Nonparametrics. Holden-Day, San Francisco, CA.
  • Lu, B., Greevy, R., Xu, X. and Beck, C. (2011). Optimal nonbipartite matching and its statistical applications. Amer. Statist. 65 21–30. (Package nbpmatching in R).
  • Marcus, S. M. (1997). Using omitted variable bias to assess uncertainty in the estimation of an AIDS education treatment effect. J. Educ. Statist. 22 193–201.
  • Maritz, J. S. (1979). A note on exact robust confidence intervals for location. Biometrika 66 163–166.
  • Neyman, J. (1923, 1990). On the application of probability theory to agricultural experiments. Statist. Sci. 5 463–480.
  • Rosenbaum, P. R. (1987). Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika 74 13–26.
  • Rosenbaum, P. R. (1993). Hodges–Lehmann point estimates of treatment effect in observational studies. J. Amer. Statist. Assoc. 88 1250–1253.
  • Rosenbaum, P. R. (2002). Observational Studies, 2nd ed. Springer, New York.
  • Rosenbaum, P. R. (2004). Design sensitivity in observational studies. Biometrika 91 153–164.
  • Rosenbaum, P. R. (2005). Heterogeneity and causality: Unit heterogeneity and design sensitivity in observational studies. Amer. Statist. 59 147–152.
  • Rosenbaum, P. R. (2007). Sensitivity analysis for $m$-estimates, tests, and confidence intervals in matched observational studies. Biometrics 63 456–464. (R package sensitivitymv).
  • Rosenbaum, P. R. (2010a). Design of Observational Studies. Springer Series in Statistics. Springer, New York.
  • Rosenbaum, P. R. (2010b). Design sensitivity and efficiency in observational studies. J. Amer. Statist. Assoc. 105 692–702.
  • Rosenbaum, P. R. (2011). A new U-statistic with superior design sensitivity in matched observational studies. Biometrics 67 1017–1027.
  • Rosenbaum, P. R. (2012a). Testing one hypothesis twice in observational studies. Biometrika 99 763–774.
  • Rosenbaum, P. R. (2012b). Optimal matching of an optimally chosen subset in observational studies. J. Comput. Graph. Statist. 21 57–71.
  • Rosenbaum, P. R. (2013). Impact of multiple matched controls on design sensitivity in observational studies. Biometrics 69 118–127.
  • Rosenbaum, P. and Rubin, D. (1983). Assessing sensitivity to an unobserved binary covariate in an observational study with binary outcome. J. Roy. Statist. Soc. Ser. B 45 212–218.
  • Rosenbaum, P. R. and Silber, J. H. (2009). Amplification of sensitivity analysis in matched observational studies. J. Amer. Statist. Assoc. 104 1398–1405.
  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Ed. Psych. 66 688–701.
  • Rubin, D. B. (1979). Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Amer. Statist. Assoc. 74 318–328.
  • Small, D. S. (2007). Sensitivity analysis for instrumental variables regression with overidentifying restrictions. J. Amer. Statist. Assoc. 102 1049–1058.
  • Stephenson, W. R. (1981). A general class of one-sample nonparametric test statistics based on subsamples. J. Amer. Statist. Assoc. 76 960–966.
  • Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statist. Sci. 25 1–21.
  • Traskin, M. and Small, D. S. (2011). Defining the study population for an observational study to ensure suffient overlap: A tree approach. Statist. Biosci. 3 94–118.
  • Wang, L. and Krieger, A. M. (2006). Causal conclusions are most sensitive to unobserved binary covariates. Stat. Med. 25 2257–2271.
  • Welch, B. L. (1937). On the $z$-test in randomized blocks. Biometrika 29 21–52.
  • Wolfe, D. A. (1974). A characterization of population weighted-symmetry and related results. J. Amer. Statist. Assoc. 69 819–822.
  • Yanagawa, T. (1984). Case–control studies: Assessing the effect of a confounding factor. Biometrika 71 191–194.
  • Yang, D., Small, D. S., Silber, J. H. and Rosenbaum, P. R. (2012). Optimal matching with minimal deviation from fine balance in a study of obesity and surgical outcomes. Biometrics 68 628–636. (R package finebalance).
  • Yu, B. B. and Gastwirth, J. L. (2005). Sensitivity analysis for trend tests: Application to the risk of radiation exposure. Biostatistics 6 201–209.
  • Zubizarreta, J. R. (2012). Using mixed integer programming for matching in an observational study of kidney failure after surgery. J. Amer. Statist. Assoc. 107 1360–1371. (R software mipmatch at
  • Zubizarreta, J. R., Paredes, R. D. and Rosenbaum, P. R. (2014). Supplement to: “Matching for balance, pairing for heterogeneity in an observational study of the effectiveness of for-profit and not-for-profit high schools in Chile.” DOI:10.1214/13-AOAS713SUPP.
  • Zubizarreta, J. R., Reinke, C. E., Kelz, R. R., Silber, J. H. and Rosenbaum, P. R. (2011). Matching for several sparse nominal variables in a case–control study of readmission following surgery. Amer. Statist. 65 229–238.

Supplemental materials

  • Supplementary material: Supplement to “Matching for balance, pairing for heterogeneity in an observational study of the effectiveness of for-profit and not-for-profit high schools in Chile”. In an online supplement we provide additional summary tables for covariate balance.