Open Access
March 2024 What makes forest-based heterogeneous treatment effect estimators work?
Susanne Dandl, Christian Haslinger, Torsten Hothorn, Heidi Seibold, Erik Sverdrup, Stefan Wager, Achim Zeileis
Author Affiliations +
Ann. Appl. Stat. 18(1): 506-528 (March 2024). DOI: 10.1214/23-AOAS1799

Abstract

Estimation of heterogeneous treatment effects (HTE) is of prime importance in many disciplines, from personalized medicine to economics among many others. Random forests have been shown to be a flexible and powerful approach to HTE estimation in both randomized trials and observational studies. In particular “causal forests” introduced by Athey, Tibshirani and Wager (Ann. Statist. 47 (2019) 1148–1178), along with the R implementation in package grf were rapidly adopted. A related approach, called “model-based forests” that is geared toward randomized trials and simultaneously captures effects of both prognostic and predictive variables, was introduced by Seibold, Zeileis and Hothorn (Stat. Methods Med. Res. 27 (2018) 3104–3125) along with a modular implementation in the R package model4you.

Neither procedure is directly applicable to the estimation of individualized predictions of excess postpartum blood loss caused by a cesarean section in comparison to vaginal delivery. Clearly, randomization is hardly possible in this setup, and thus model-based forests lack clinical trial data to address this question. On the other hand, the skewed and interval-censored postpartum blood loss observations violate assumptions made by causal forests. Here we present a tailored model-based forest for skewed and interval-censored data to infer possible predictive prepartum characteristics and their impact on excess postpartum blood loss caused by a cesarean section.

As a methodological basis, we propose a unifying view on causal and model-based forests that goes beyond the theoretical motivations and investigates which computational elements make causal forests so successful and how these can be blended with the strengths of model-based forests. To do so, we show that both methods can be understood in terms of the same parameters and model assumptions for an additive model under L2 loss. This theoretical insight allows us to implement several flavors of “model-based causal forests” and dissect their different elements in silico.

The original causal forests and model-based forests are compared with the new blended versions in a benchmark study exploring both randomized trials and observational settings. In the randomized setting, both approaches performed akin. If confounding was present in the data-generating process, we found local centering of the treatment indicator with the corresponding propensities to be the main driver for good performance. Local centering of the outcome was less important and might be replaced or enhanced by simultaneous split selection with respect to both prognostic and predictive effects. This lays the foundation for future research combining random forests for HTE estimation with other types of models.

Funding Statement

The work of Susanne Dandl was funded by the Munich Center for Machine Learning (MCML). Torsten Hothorn received funding from the Swiss National Science Foundation, Grant No. 200021_184603, Horizon 2020 Research and Innovation Programme of the European Union under grant agreement number 681094 and is supported by the Swiss State Secretariat for Education, Research and Innovation (SERI) under contract number 15.0137.

Acknowledgments

We thank the Handling Editor, Dr. Griffin, and two anonymous referees for helpful criticism of the initial manuscript.

Citation

Download Citation

Susanne Dandl. Christian Haslinger. Torsten Hothorn. Heidi Seibold. Erik Sverdrup. Stefan Wager. Achim Zeileis. "What makes forest-based heterogeneous treatment effect estimators work?." Ann. Appl. Stat. 18 (1) 506 - 528, March 2024. https://doi.org/10.1214/23-AOAS1799

Information

Received: 1 September 2022; Revised: 1 June 2023; Published: March 2024
First available in Project Euclid: 31 January 2024

MathSciNet: MR4698618
Digital Object Identifier: 10.1214/23-AOAS1799

Keywords: Causal forests , heterogeneous treatment effects , observational data , Personalized medicine , postpartum hemorrhage , Random forest

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.18 • No. 1 • March 2024
Back to Top