A dynamic treatment regime (DTR) is a sequence of decision rules, one per stage of intervention, that maps up-to-date patient information to a recommended treatment. Discovering an appropriate DTR for a given disease is a challenging issue especially when a large set of prognostic variables are observed. To address this problem, we propose penalized regression-based learning methods with penalty to estimate the optimal DTR that would maximize the expected outcome if implemented. We also provide generalization error bounds of the estimated DTR in the setting of finite number of stages with multiple treatment options. We first examine the relationship between value and Q-functions and derive a finite sample upper bound on the difference in values between the optimal and the estimated DTRs. For practical implementation, we develop an algorithm with partial regularization via orthogonality to construct the optimal DTR. The advantages of the proposed methods are demonstrated with extensive simulation studies and data analysis of depression clinical trials.
This work is partially funded by NIH Grants R01MH109496, R21MH108999 and NSF Grant DMS-2112938.
The authors thank the Associate Editor and referees for their helpful comments.
"Generalization error bounds of dynamic treatment regimes in penalized regression-based learning." Ann. Statist. 50 (4) 2047 - 2071, August 2022. https://doi.org/10.1214/22-AOS2171