Open Access
Translator Disclaimer
September 2018 Tree-based reinforcement learning for estimating optimal dynamic treatment regimes
Yebin Tao, Lu Wang, Daniel Almirall
Ann. Appl. Stat. 12(3): 1914-1938 (September 2018). DOI: 10.1214/18-AOAS1137


Dynamic treatment regimes (DTRs) are sequences of treatment decision rules, in which treatment may be adapted over time in response to the changing course of an individual. Motivated by the substance use disorder (SUD) study, we propose a tree-based reinforcement learning (T-RL) method to directly estimate optimal DTRs in a multi-stage multi-treatment setting. At each stage, T-RL builds an unsupervised decision tree that directly handles the problem of optimization with multiple treatment comparisons, through a purity measure constructed with augmented inverse probability weighted estimators. For the multiple stages, the algorithm is implemented recursively using backward induction. By combining semiparametric regression with flexible tree-based learning, T-RL is robust, efficient and easy to interpret for the identification of optimal DTRs, as shown in the simulation studies. With the proposed method, we identify dynamic SUD treatment regimes for adolescents.


Download Citation

Yebin Tao. Lu Wang. Daniel Almirall. "Tree-based reinforcement learning for estimating optimal dynamic treatment regimes." Ann. Appl. Stat. 12 (3) 1914 - 1938, September 2018.


Received: 1 October 2016; Revised: 1 August 2017; Published: September 2018
First available in Project Euclid: 11 September 2018

zbMATH: 06979657
MathSciNet: MR3852703
Digital Object Identifier: 10.1214/18-AOAS1137

Keywords: backward induction , ‎classification‎ , decision tree , Multi-stage decision-making , Personalized medicine

Rights: Copyright © 2018 Institute of Mathematical Statistics


Vol.12 • No. 3 • September 2018
Back to Top