Electronic Journal of Statistics

Dynamic treatment regimes: Technical challenges and applications

Eric B. Laber, Daniel J. Lizotte, Min Qian, William E. Pelham, and Susan A. Murphy

Full-text: Open access

Abstract

Dynamic treatment regimes are of growing interest across the clinical sciences because these regimes provide one way to operationalize and thus inform sequential personalized clinical decision making. Formally, a dynamic treatment regime is a sequence of decision rules, one per stage of clinical intervention. Each decision rule maps up-to-date patient information to a recommended treatment. We briefly review a variety of approaches for using data to construct the decision rules. We then review a critical inferential challenge that results from nonregularity, which often arises in this area. In particular, nonregularity arises in inference for parameters in the optimal dynamic treatment regime; the asymptotic, limiting, distribution of estimators are sensitive to local perturbations. We propose and evaluate a locally consistent Adaptive Confidence Interval (ACI) for the parameters of the optimal dynamic treatment regime. We use data from the Adaptive Pharmacological and Behavioral Treatments for Children with ADHD Trial as an illustrative example. We conclude by highlighting and discussing emerging theoretical problems in this area.

Article information

Source
Electron. J. Statist., Volume 8, Number 1 (2014), 1225-1272.

Dates
First available in Project Euclid: 20 August 2014

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1408540283

Digital Object Identifier
doi:10.1214/14-EJS920

Mathematical Reviews number (MathSciNet)
MR3263118

Zentralblatt MATH identifier
1298.62189

Keywords
Personalized medicine data-driven decision making nonregular inference adaptive confidence intervals

Citation

Laber, Eric B.; Lizotte, Daniel J.; Qian, Min; Pelham, William E.; Murphy, Susan A. Dynamic treatment regimes: Technical challenges and applications. Electron. J. Statist. 8 (2014), no. 1, 1225--1272. doi:10.1214/14-EJS920. https://projecteuclid.org/euclid.ejs/1408540283


Export citation

References

  • Andrews, D.W. and Soares, G., Inference for Parameters Defined by Moment Inequalities Using Generalized Moment Selection. SSRN eLibrary, 2007.
  • Andrews, D.W.K., Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica, 68(2):399–405, 2000.
  • Andrews, D.W.K., Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica, 69:683–734, 2001a.
  • Andrews, D.W.K., Testing when a parameter is on the boundary of the maintained hypothesis. Econometrica, 69(3):683–734, 2001b.
  • Andrews, D.W.K. and Guggenberger, P., Incorrect asymptotic size of subsampling procedures based on post-consistent model selection estimators. Journal of Econometrics, 152(1):19–27, 2009.
  • Anthony, M. and Bartlett, P.L., Neural Network Learning: Theoretical Foundations. Cambridge University Press, 1999.
  • Barto, A.G. and Dieterich, T., Reinforcement learning and its relation to supervised learning. Handbook of Learning and Approximate Dynamic Programming, pages 45–63, 2004.
  • Bellman, R.E., Dynamic Programming. Princeton University Press, 1957.
  • Berger, R.L. and Boos, D., P values maximized over a confidence set for the nuisance parameters. Journal of the American Statistical Association, 89(427):1012–1016, 1994.
  • Bickel, P.J., Minimax estimation of the mean of a normal distribution when the parameter space is restricted. The Annals of Statistics, 9(6):1301–1309, 1981.
  • Bickel, P.J. and Freedman, D.A., Some asymptotic theory for the bootstrap. The Annals of Statistics, pages 1196–1217, 1981.
  • Bickel, P.J., Klaassen, A.J., Ritov, Y., and Wellner, J.A., Efficient and Adaptive Inference in Semi-Parametric Models. Johns Hopkins University Press, Baltimore, 1993.
  • Bickman, L., Kelley, S.D. and Athay, M., The technology of measurement feedback systems. Couple and Family Psychology: Research and Practice, 1(4):274–284, 2012.
  • Blumenthal, S. and Cohen, A., Estimation of the larger of two normal means. Journal of the American Statistical Association, pages 861–876, 1968.
  • Busoniu, L., Babuska, R., De Schutter, B., and Ernst, D., Reinforcement Learning and Dynamic Programming Using Function Approximators. CRC Press, 2010.
  • Casella, G. and Strawderman, W.E., Estimating a bounded normal mean. The Annals of Statistics, pages 870–878, 1981.
  • Chakraborty, B., Laber, E.B., and Zhao, Y., Inference for optimal dynamic treatment regimes using an adaptive m-out-of-n bootstrap scheme. Biometrics, TBA(TBA):TBA, 2013.
  • Chakraborty, B., Murphy, S., and Strecher, V., Inference for non-regular parameters in optimal dynamic treatment regimes. Statistical Methods in Medical Research, 19(3), 2009.
  • Chakraborty, B. and Moodie, E.E.M., Statistical Methods for Dynamic Treatment Regimes. Springer, 2013.
  • Chakraborty, B. and Murphy, S.A., Dynamic treatment regimes. Annual Review of Statistics and Its Application, 1(1):null, 2014. URL http://www.annualreviews.org/doi/abs/10.1146/annurev-statistics-022513-115553.
  • Chen, J., Notes on the bias-variance trade-off phenomenon. A Festschrift for Herman Rubin: Institute of Mathematical Statistics, 45:207–217, 2004.
  • Cheng, X., Robust confidence intervals in nonlinear regression under weak identification. Job Market Paper, 2008.
  • Csörgő, S. and Rosalsky, A., A survey of limit laws for bootstrapped sums. International Journal of Mathematics and Mathematical Statistics, 45:2835–2861, 2003.
  • Davison, A.C. and Hinkley, D.V., Bootstrap Methods and Their Application, volume 1. Cambridge university press, 1997.
  • Dusseldorp, E. and Van Mechelen, I., Qualitative interaction trees: A tool to identify qualitative treatment–subgroup interactions. Statistics in Medicine, 2013.
  • Foster, J.C., Taylor, J.M.G. and Ruberg, S.J. Subgroup identification from randomized clinical trial data. Statistics in Medicine, 30(24):2867–2880, 2011.
  • Goldberg, Y., Song, R., and Kosorok, M.R., Adaptive q-learning. From Probability to Statistics and Back: High-Dimensional Models and Processes, page 150, 2012.
  • Gunter, L., Zhu, J., and Murphy, S.A., Variable selection for qualititative interactions. Statistical Methodology, 8(1):42–55, 2011.
  • Hamburg, M.A. and Collins, F.S., The path to personalized medicine. New England Journal of Medicine, 363(4):301–304, 2010.
  • Henderson, R., Ansell, P., and Alshibani, D., Regret-regression for optimal dynamic treatment regimes. Biometrics, 66(4), 2009.
  • Hirano, K. and Porter, J., Impossibility results for nondifferentiable functionals. Mpra paper, University Library of Munich, Germany, 2009. URL http://econpapers.repec.org/RePEc:pra:mprapa:15990.
  • Hirano, K. and Porter, J.R., Impossibility results for nondifferentiable functionals. Econometrica, 80(4):1769–1790, 2012.
  • Janes, H., Brown, M.D., Pepe, M., and Huang, Y., Statistical methods for evaluating and comparing biomarkers for patient treatment selection, 2013.
  • Kelly, J., Gooding, P., Pratt, D., Ainsworth, J., Welford, M., and Tarrier, N., Intelligent real-time therapy: Harnessing the power of machine learning to optimise the delivery of momentary cognitive-behavioural interventions. Journal of Mental Health, 21(4):404–414, 2012.
  • Konda, V.R. and Tsitsiklis, J.N., Onactor-critic algorithms. SIAM Journal on Control and Optimization, 42(4):1143–1166, 2003.
  • Kosorok, M.R., Introduction to Empirical Processes and Semiparametric Inference. Springer, 2008.
  • Laber, E.B., Linn, L.A., and Stefanski, L.A., Interactive model-building for $q$-learning. Biometrika, to appear, 2014.
  • Laber, E., Qian, M., Lizotte, D.J., and Murphy, S.A., Statistical inference in dynamic treatment regimes. arXiv preprint arXiv:1006.5831, 2010.
  • Laber, E.B. and Murphy, S.A., Adaptive confidence intervals for the test error in classification. Journal of the American Statistical Association, 106(495):904–913, 2011.
  • Lavori, P.W. and Dawson, R., A design for testing clinical strategies: Biased adaptive within-subject randomization. Journal of the Royal Statistical Society: Series A (Statistics in Society), 163(1):29–38, 2000.
  • Leeb, H. and Poetscher, B.M., The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory, 19(1):100–142, 2003.
  • Leeb, H. and Pötscher, B.M., The finite-sample distribution of post-model-selection estimators and uniform versus nonuniform approximations. Econometric Theory, 19(1):100–142, 2003.
  • H. Leeb and Pötscher, B.M., Model selection and inference: Facts and fiction. Econometric Theory, 21(01):21–59, 2005.
  • Lei, H., Nahum-Shani, I., Lynch, K., Oslin, D., and Murphy, S.A., A “smart” design for building individualized treatment sequences. Annual Review of Clinical Psychology, 8:21–48, 2012.
  • Liu, R.C. and Brown, L.D., Nonexistence of informative unbiased estimators in singular problems. Annals of Statistics, 21(1):1–13, 1993.
  • Marchand, E. and Strawderman, W.E., Estimation in restricted parameter spaces: A review. Lecture Notes-Monograph Series, pages 21–44, 2004.
  • Moodie, E.E.M., Richardson, T.S., and Stephens, D.A., Estimating optimal dynamic regimes: Correcting bias under the null. Biometrics, 63(2):447–455, 2010.
  • Murphy, S.A., An experimental design for the development of adaptive treatment strategies. Statistics in medicine, 24(10):1455–1481, 2005a.
  • Murphy, S.A., Van Der Laan, M.J., and Robins, J.M., Marginal mean models for dynamic regimes. Journal of the American Statistical Association, 96(456):1410–1423, 2001.
  • Murphy, S.A., Optimal dynamic treatment regimes. Journal of the Royal Statistical Society, Series B, 65(2):331–366, 2003.
  • Murphy, S.A., A generalization error for Q-learning. Journal of Machine Learning Research, 6:1073–1097, Jul 2005b.
  • Nahum-Shani, I., Qian, M., Almirall, D., Pelham, W.E., Gnagy, B., Fabiano, G.A., Waxmonsky, J.G., Yu, J., and Murphy, S.A., Experimental design and primary data analysis methods for comparing adaptive interventions. Psychological methods, 17(4):457, 2012a.
  • Nahum-Shani, I., Qian, M., Almirall, D., Pelham, W.E., Gnagy, B., Fabiano, G.A., Waxmonsky, J.G., Yu, J., and Murphy, S.A., Q-learning: A data analysis method for constructing adaptive interventions. Psychological methods, 17(4):478, 2012b.
  • Olshen, R.A., The conditional level of the F-test. Journal of the American Statistical Association, 68(343):692–698, 1973.
  • Orellana, L., Rotnitzky, A., and Robins, J., Dynamic regime marginal structural mean models for estimation of optimal dynamic treatment regimes, part i: Main content. Int. Jrn. of Biostatistics, 6(2), 2010.
  • The PSU Methodology Center, Nih program announcements, January 2014a. URL https://methodology.psu.edu/ra/adap-inter/NIHfunding.
  • The PSU Methodology Center, Smart studies, January 2014b. URL http://methodology.psu.edu/ra/adap-inter/projects.
  • Putterman, M.L., Markov Decision Processes. John Wiely and Sons, New York, 1994.
  • Qian, M., Nahum-Shani, I., and Murphy, S.A., Dynamic treatment regimes. In Modern Clinical Trial Analysis, pages 127–148. Springer, 2013.
  • Robins, J., A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect. Mathematical Modelling, 7(9):1393–1512, 1986.
  • Robins, J.M., Addendum to “A new approach to causal inference in mortality studies with a sustained exposure period-application to control of the healthy worker survivor effect”. Computers & Mathematics with Applications, 14(9):923–945, 1987.
  • Robins, J.M., The analysis of randomized and non-randomized aids treatment trials using a new approach to causal inference in longitudinal studies. Health Service Research Methodology: A Focus on AIDS, 113:159, 1989.
  • Robins, J.M., Information recovery and bias adjustment in proportional hazards regression analysis of randomized trials using surrogate markers. In Proceedings of the Biopharmaceutical Section, American Statistical Association, volume 24, page 3. American Statistical Association, 1993.
  • Robins, J.M., Causal inference from complex longitudinal data. In Latent Variable Modeling and Applications to Causality, pages 69–117. Springer, 1997.
  • Robins, J.M., Testing and estimation of direct effects by reparameterizing directed acyclic graphs with structural nested models. Computation, Causation, and Discovery, pages 349–405, 1999.
  • Robins, J.M., Marginal structural models. 1997 Proceedings of the American Statistical Association, Section on Bayesian Statistical Science, pp. 1–10, 1998.
  • Robins, J.M., Optimal structural nested models for optimal sequential decisions. In Proceedings of the Second Seattle Symposium in Biostatistics: Analysis of Correlated Data, 2004.
  • Robins, J.M., Orellana, L., and Rotnitzky, A., Estimation and extrapolation of optimal treatment and testing strategies. Statistics in Medicine, pages 4678–4721, 2008.
  • Rubin, D.B., Bayesian inference for causal effects: The role of randomization. The Annals of Statistics, pages 34–58, 1978.
  • Schulte, P.J., Tsiatis, A.A., Laber, E.B., and Davidian, M., Q- and a-learning methods for estimating optimal dynamic treatment regimes. Technical Report, arXiv:1202.4177v2, arXiv.org, 2013.
  • Si, J., Barto, A.G., Powell, W.B., Wunsch, D.C., et al., Handbook of Learning and Approximate Dynamic Programming. IEEE Press Los Alamitos, 2004.
  • Song, R., Wang, W., Zeng, D., and Kosorok, M., Penalized q-learning for dynamic treatment regimes. Technical Report, arXiv:1108.5338v1, arXiv.org, 2011.
  • Sutton, R.S., McAllester, D.A., Singh, S.P., and Mansour, Y., Policy gradient methods for reinforcement learning with function approximation. In NIPS, volume 99, pages 1057–1063, 1999.
  • Sutton, R.S. and Barto, A.G., Reinforcment Learning: An Introduction. The MIT Press, 1998.
  • Szepesvári, C., Algorithms for reinforcement learning. Synthesis Lectures on Artificial Intelligence and Machine Learning, 4(1):1–103, 2010.
  • Tsiatis, A.A., Semiparametric Theory and Missing Data. Springer Verlag, 2006.
  • van der Laan, M.J., Causal effect models for intention to treat and realistic individualized treatment rules. 2006.
  • van der Laan, M.J. and Petersen, M.L., Causal effect models for realistic individualized treatment and intention to treat rules. International Journal of Biostatistics, 3(1):3, 2007.
  • Van der Vaart, A., On differentiable functionals. The Annals of Statistics, pages 178–204, 1991.
  • Van der Vaart, A. and Wellner, J., Weak Convergence and Empirical Processes: With Application to Statistics. Springer, 1996.
  • Watkins, C.J.C.H. and Dayan, P., Q-learning. Machine Learning, 8(3):279–292, 1992.
  • Wiering, M. and van Otterlo, M., Reinforcement Learning: State-of-the-art, volume 12. Springer, 2012.
  • Zhang, B., Tsiatis, A.A., Laber, E.B., and Davidian, M., Robust estimation of optimal dynamic treatment regimes for sequential treatment decisions. Biometrika, To appear, 2013.
  • Zhang, B., Tsiatis, A.A., Laber, E.B., and Davidian, M., A robust method for estimating optimal treatment regimes. Biometrics, 68(4):1010–1018, 2012.
  • Zhao, Y., Zeng, D., Laber, E.B., and Kosorok, M.R., New statistical learning methods for estimating optimal dynamic treatment regimes. Under Review, 107(499):1106–1118, 2013.
  • Zhao, Y., Zeng, D., Rush, A.J., and Kosorok, M.R., Estimating individualized treatment rules using outcome weighted learning. Journal of the American Statistical Association, 107(499):1106–1118, 2012.

See also