The Annals of Statistics

Covariate balancing propensity score by tailored loss functions

Qingyuan Zhao

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


In observational studies, propensity scores are commonly estimated by maximum likelihood but may fail to balance high-dimensional pretreatment covariates even after specification search. We introduce a general framework that unifies and generalizes several recent proposals to improve covariate balance when designing an observational study. Instead of the likelihood function, we propose to optimize special loss functions—covariate balancing scoring rules (CBSR)—to estimate the propensity score. A CBSR is uniquely determined by the link function in the GLM and the estimand (a weighted average treatment effect). We show CBSR does not lose asymptotic efficiency in estimating the weighted average treatment effect compared to the Bernoulli likelihood, but CBSR is much more robust in finite samples. Borrowing tools developed in statistical learning, we propose practical strategies to balance covariate functions in rich function classes. This is useful to estimate the maximum bias of the inverse probability weighting (IPW) estimators and construct honest confidence intervals in finite samples. Lastly, we provide several numerical examples to demonstrate the tradeoff of bias and variance in the IPW-type estimators and the tradeoff in balancing different function classes of the covariates.

Article information

Ann. Statist., Volume 47, Number 2 (2019), 965-993.

Received: March 2017
Revised: November 2017
First available in Project Euclid: 11 January 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62P10: Applications to biology and medical sciences
Secondary: 62C99: None of the above, but in this section

Convex optimization kernel method inverse probability weighting proper scoring rule regularized regression statistical decision theory


Zhao, Qingyuan. Covariate balancing propensity score by tailored loss functions. Ann. Statist. 47 (2019), no. 2, 965--993. doi:10.1214/18-AOS1698.

Export citation


  • [1] Abadie, A. and Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica 74 235–267.
  • [2] Athey, S., Imbens, G. W., Wager, S. et al. (2016). Approximate residual balancing: De-biased inference of average treatment effects in high dimensions. J. R. Stat. Soc. Ser. B. Stat. Methodol. 80 597–623.
  • [3] Austin, P. C. and Stuart, E. A. (2015). Moving towards best practice when using inverse probability of treatment weighting (IPTW) using the propensity score to estimate causal treatment effects in observational studies. Stat. Med. 34 3661–3679.
  • [4] Breiman, L., Friedman, J. H., Olshen, R. A. and Stone, C. J. (1984). Classification and Regression Trees. Wadsworth Advanced Books and Software, Belmont, CA.
  • [5] Buja, A., Stuetzle, W. and Shen, Y. (2005). Loss functions for binary class probability estimation and classification: Structure and applications. Working Draft.
  • [6] Caliendo, M. and Kopeinig, S. (2008). Some practical guidance for the implementation of propensity score matching. J. Econ. Surv. 22 31–72.
  • [7] Chan, K. C. G., Yam, S. C. P. and Zhang, Z. (2016). Globally efficient non-parametric inference of average treatment effects by empirical balancing calibration weighting. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 673–700.
  • [8] Cochran, W. G. (1953). Matching in analytical studies. Am. J. Public Health Nation’s Health 43 684–691.
  • [9] Cochran, W. G. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 24 295–313.
  • [10] Crump, R., Hotz, V. J., Imbens, G. and Mitnik, O. (2006). Moving the goalposts: Addressing limited overlap in the estimation of average treatment effects by changing the estimand. Technical Report No. 330, National Bureau of Economic Research, Cambridge, MA.
  • [11] Deville, J.-C. and Särndal, C.-E. (1992). Calibration estimators in survey sampling. J. Amer. Statist. Assoc. 87 376–382.
  • [12] Efron, B., Hastie, T., Johnstone, I. and Tibshirani, R. (2004). Least angle regression. Ann. Statist. 32 407–499.
  • [13] Fan, J., Imai, K., Liu, H., Ning, Y. and Yang, X. (2016). Improving covariate balancing propensity score: A doubly robust and efficient approach. Technical Report, Princeton Univ., Princeton, NJ.
  • [14] Friedman, J., Hastie, T. and Tibshirani, R. (2000). Additive logistic regression: A statistical view of boosting. Ann. Statist. 28 337–407.
  • [15] Friedman, J. H. (2001). Greedy function approximation: A gradient boosting machine. Ann. Statist. 29 1189–1232.
  • [16] Gneiting, T. and Raftery, A. E. (2007). Strictly proper scoring rules, prediction, and estimation. J. Amer. Statist. Assoc. 102 359–378.
  • [17] Graham, B. S., De Xavier Pinto, C. C. and Egel, D. (2012). Inverse probability tilting for moment condition model with missing data. Rev. Econ. Stud. 79 1053–1079.
  • [18] Gretton, A., Borgwardt, K. M., Rasch, M. J., Schölkopf, B. and Smola, A. (2012). A kernel two-sample test. J. Mach. Learn. Res. 13 723–773.
  • [19] Hainmueller, J. (2011). Entropy balancing for causal effects: A multivariate reweighting method to produce balanced samples in observational studies. Polit. Anal. 20 25–46. DOI:10.1093/pan/mpr025.
  • [20] Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
  • [21] Hazlett, C. (2016). Kernel balancing: A flexible non-parametric weighting procedure for estimating causal effects. Available at arXiv:1605.00155.
  • [22] Heckman, J. J., Ichimura, H. and Todd, P. E. (1997). Matching as an econometric evaluation estimator: Evidence from evaluating a job training programme. Rev. Econ. Stud. 64 605–654.
  • [23] Hirano, K. and Imbens, G. W. (2001). Estimation of causal effects using propensity score weighting: An application to data on right heart catheterization. Health Serv. Outcomes Res. Methodol. 2 259–278. DOI:10.1023/A:1020371312283.
  • [24] Hirano, K., Imbens, G. W. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71 1161–1189.
  • [25] Hofmann, T., Schölkopf, B. and Smola, A. J. (2008). Kernel methods in machine learning. Ann. Statist. 36 1171–1220.
  • [26] Imai, K., King, G. and Stuart, E. A. (2008). Misunderstanding between experimentalists and observationalists about causal inference. J. Roy. Statist. Soc. Ser. A 171 481–502.
  • [27] Imai, K. and Ratkovic, M. (2014). Covariate balancing propensity score. J. R. Stat. Soc. Ser. B. Stat. Methodol. 76 243–263.
  • [28] Imbens, G. W. (2004). Nonparametric estimation of average treatment effects under exogeneity: A review. Rev. Econ. Stat. 86 4–29.
  • [29] Imbens, G. W. and Rubin, D. B. (2015). Causal Inference for Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge Univ. Press, New York.
  • [30] Janson, L., Foygel Barber, R. and Candès, E. (2017). EigenPrism: Inference for high dimensional signal-to-noise ratios. J. R. Stat. Soc. Ser. B. Stat. Methodol. 79 1037–1065.
  • [31] Kallus, N. (2016). Generalized optimal matching methods for causal inference. Available at arXiv:1612.08321.
  • [32] Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22 523–539.
  • [33] Li, F., Morgan, K. L. and Zaslavsky, A. M. (2016). Balancing covariates via propensity score weighting. J. Amer. Statist. Assoc. 113 390–400.
  • [34] Lunceford, J. K. and Davidian, M. (2004). Stratification and weighting via the propensity score in estimation of causal treatment effects: A comparative study. Stat. Med. 23 2937–2960.
  • [35] McCullagh, P. and Nelder, J. A. (1989). Generalized Linear Models, 2nd ed. Chapman & Hall, London.
  • [36] Müller, A. (1997). Integral probability metrics and their generating classes of functions. Adv. in Appl. Probab. 29 429–443.
  • [37] Normand, S.-L. T., Landrum, M. B., Guadagnoli, E., Ayanian, J. Z., Ryan, T. J., Cleary, P. D. and McNeil, B. J. (2001). Validating recommendations for coronary angiography following acute myocardial infarction in the elderly: A matched analysis using propensity scores. J. Clin. Epidemiol. 54 387–398.
  • [38] Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846–866.
  • [39] Rosenbaum, P. R. and Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • [40] Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score. J. Amer. Statist. Assoc. 79 516–524.
  • [41] Rosenbaum, P. R. and Rubin, D. B. (1985). Constructing a control group using multivariate matched sampling methods that incorporate the propensity score. Amer. Statist. 39 33–38.
  • [42] Rubin, D. B. (1973). Matching to remove bias in observational studies. Biometrics 159–183.
  • [43] Rubin, D. B. (2008). For objective causal inference, design trumps analysis. Ann. Appl. Stat. 2 808–804.
  • [44] Rubin, D. B. (2009). Author’s reply: Should observational studies be designed to allow lack of balance in covariate distributions across treatment groups? Stat. Med. 28 1420–1423.
  • [45] Savage, L. J. (1971). Elicitation of personal probabilities and expectations. J. Amer. Statist. Assoc. 66 783–801.
  • [46] Smith, J. A. and Todd, P. E. (2005). Does matching overcome LaLonde’s critique of nonexperimental estimators? J. Econometrics 125 305–353.
  • [47] Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statist. Sci. 25 1–21.
  • [48] Wahba, G. (1990). Spline Models for Observational Data. CBMS-NSF Regional Conference Series in Applied Mathematics 59. SIAM, Philadelphia, PA.
  • [49] Wang, Y. and Zubizarreta, J. R. (2017). Approximate balancing weights: Characterizations from a shrinkage estimation perspective. Available at arXiv:1705.00998.
  • [50] Wong, R. K. W. and Chan, K. C. G. (2018). Kernel-based covariate functional balancing for observational studies. Biometrika 105 199–213.
  • [51] Zhao, Q. (2018). Supplement to “Covariate balancing propensity score by tailored loss functions.” DOI:10.1214/18-AOS1698SUPP.
  • [52] Zhao, Q. and Percival, D. (2017). Entropy balancing is doubly robust. J. Causal Inference 5.
  • [53] Zubizarreta, J. R. (2015). Stable weights that balance covariates for estimation with incomplete outcome data. J. Amer. Statist. Assoc. 110 910–922.

Supplemental materials

  • Supplement to “Covariate balancing propensity score by tailored loss functions”. In this supplement we provide the detailed proof for the theoretical results and some graphical illustration of the Beta-family of scoring rules.