Statistical Science

Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges

Sofía S. Villar, Jack Bowden, and James Wason

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Multi-armed bandit problems (MABPs) are a special type of optimal control problem well suited to model resource allocation under uncertainty in a wide variety of contexts. Since the first publication of the optimal solution of the classic MABP by a dynamic index rule, the bandit literature quickly diversified and emerged as an active research topic. Across this literature, the use of bandit models to optimally design clinical trials became a typical motivating application, yet little of the resulting theory has ever been used in the actual design and analysis of clinical trials. To this end, we review two MABP decision-theoretic approaches to the optimal allocation of treatments in a clinical trial: the infinite-horizon Bayesian Bernoulli MABP and the finite-horizon variant. These models possess distinct theoretical properties and lead to separate allocation rules in a clinical trial design context. We evaluate their performance compared to other allocation rules, including fixed randomization. Our results indicate that bandit approaches offer significant advantages, in terms of assigning more patients to better treatments, and severe limitations, in terms of their resulting statistical power. We propose a novel bandit-based patient allocation rule that overcomes the issue of low power, thus removing a potential barrier for their use in practice.

Article information

Statist. Sci. Volume 30, Number 2 (2015), 199-215.

First available in Project Euclid: 3 June 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Multi-armed bandit Gittins index Whittle index patient allocation response adaptive procedures


Villar, Sofía S.; Bowden, Jack; Wason, James. Multi-armed Bandit Models for the Optimal Design of Clinical Trials: Benefits and Challenges. Statist. Sci. 30 (2015), no. 2, 199--215. doi:10.1214/14-STS504.

Export citation


  • Armitage, P. (1985). The search for optimality in clinical trials. Internat. Statist. Rev. 53 15–24.
  • Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Machine Learning 47 235–256.
  • Barker, A. D., Sigman, C. C., Kelloff, G. J., Hylton, N. M., Berry, D. A. and Esserman, L. J. (2009). I-SPY 2: An adaptive breast cancer trial design in the setting of neoadjuvant chemotherapy. Clinical Pharmacology & Therapeutics 86 97–100.
  • Bather, J. A. (1981). Randomized allocation of treatments in sequential experiments. J. R. Stat. Soc. Ser. B Stat. Methodol. 43 265–292.
  • Beale, E. (1979). Contribution to the discussion of Gittins. J. R. Stat. Soc. Ser. B Stat. Methodol. 41 171–172.
  • Bellman, R. (1952). On the theory of dynamic programming. Proc. Natl. Acad. Sci. USA 38 716–719.
  • Bellman, R. (1956). A problem in the sequential design of experiments. Sankhyā 16 221–229.
  • Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments. Monographs on Statistics and Applied Probability. Chapman & Hall, London.
  • Bertsimas, D. and Niño-Mora, J. (1996). Conservation laws, extended polymatroids and multiarmed bandit problems; A polyhedral approach to indexable systems. Math. Oper. Res. 21 257–306.
  • Caro, F. and Yoo, O. S. (2010). Indexability of bandit problems with response delays. Probab. Engrg. Inform. Sci. 24 349–374.
  • Chen, Y. R. and Katehakis, M. N. (1986). Linear programming for finite state multi-armed bandit problems. Math. Oper. Res. 11 180–183.
  • Do Amaral, J. (1985). Aspects of Optimal Sequential Resource Allocation. Univ. Oxford, Oxford.
  • Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. J. R. Stat. Soc. Ser. B Stat. Methodol. 41 148–177.
  • Gittins, J., Glazebrook, K. and Weber, R. (2011). Multi-Armed Bandit Allocation Indices. Wiley, Chichester.
  • Gittins, J. C. and Jones, D. M. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics (European Meeting Statisticians, Budapest, 1972). Colloq. Math. Soc. János Bolyai 9 241–266. North-Holland, Amsterdam.
  • Gittins, J. C. and Jones, D. M. (1979). A dynamic allocation index for the discounted multiarmed bandit problem. Biometrika 66 561–565.
  • Gittins, J. and Wang, Y.-G. (1992). The learning component of dynamic allocation indices. Ann. Statist. 20 1625–1636.
  • Glazebrook, K. D. (1978). On the optimal allocation of two or more treatments in a controlled clinical trial. Biometrika 65 335–340.
  • Glazebrook, K. D. (1980). On randomized dynamic allocation indices for the sequential design of experiments. J. R. Stat. Soc. Ser. B Stat. Methodol. 42 342–346.
  • Jones, D. (1970). A sequential method for industrial chemical research. Master’s thesis, Univ. College of Wales, Aberystwyth.
  • Jones, D. (1975). Search procedures for industrial chemical research. Ph.D. thesis, Univ. Cambridge.
  • Katehakis, M. N. and Derman, C. (1986). Computing optimal sequential allocation rules in clinical trials. In Adaptive Statistical Procedures and Related Topics (Upton, N.Y., 1985). Institute of Mathematical Statistics Lecture Notes—Monograph Series 8 29–39. IMS, Hayward, CA.
  • Katehakis, M. and Veinott, A. Jr (1985). The multi-armed bandit problem: Decomposition and computation. Technical report, Dept. Oper. Res., Stanford Univ., Stanford, CA.
  • Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22.
  • Magirr, D., Jaki, T. and Whitehead, J. (2012). A generalized Dunnett test for multi-arm multi-stage clinical studies with treatment selection. Biometrika 99 494–501.
  • Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Adv. in Appl. Probab. 33 76–98.
  • Niño-Mora, J. (2005). A marginal productivity index policy for the finite-horizon multiarmed bandit problem. In CDC-ECC’05: Proceedings of the 44th IEEE Conference on Decision and Control and European Control Conference 2005 (Sevilla, Spain) 1718–1722. IEEE, New York.
  • Niño-Mora, J. (2007). A $(2/3)n^{3}$ fast-pivoting algorithm for the Gittins index and optimal stopping of a Markov chain. INFORMS J. Comput. 19 596–606.
  • Niño-Mora, J. (2011). Computing a classic index for finite-horizon bandits. INFORMS J. Comput. 23 254–267.
  • Palmer, C. R. (2002). Ethics, data-dependent designs, and the strategy of clinical trials: Time to start learning-as-we-go? Stat. Methods Med. Res. 11 381–402.
  • Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. (N.S.) 58 527–535.
  • Robinson, D. (1982). Algorithms for evaluating the dynamic allocation index. Oper. Res. Lett. 1 72–74.
  • Stangl, D., Inoue, L. Y. T. and Irony, T. Z. (2012). Celebrating 70: An interview with Don Berry. Statist. Sci. 27 144–159.
  • Sydes, M. R., Parmar, M. K. B., James, N. D., Clarke, N. W., Dearnaley, D. P., Mason, M. D., Morgan, R. C., Sanders, K. and Royston, P. (2009). Issues in applying multi-arm multi-stage methodology to a clinical trial in prostate cancer: The MRC STAMPEDE trial. Trials 10 39.
  • Tang, H., Foster, N. R., Grothey, A., Ansell, S. M., Goldberg, R. M. and Sargent, D. J. (2010). Comparison of error rates in single-arm versus randomized phase II cancer clinical trials. J. Clin. Oncol. 28 1936–1941.
  • Thall, P. F. and Wathen, J. K. (2007). Practical Bayesian adaptive randomisation in clinical trials. European Journal of Cancer 43 859–866.
  • Thompson, W. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 285–294.
  • Trippa, L., Lee, E. Q., Wen, P. Y., Batchelor, T. T., Cloughesy, T., Parmigiani, G. and Alexander, B. M. (2012). Bayesian adaptive randomized trial design for patients with recurrent glioblastoma. J. Clin. Oncol. 30 3258–3263.
  • U.S. Food and Drug Administration (2006). Guidance for clinical trial sponsors: Establishment and operation of clinical trial data monitoring committees. Available at
  • Varaiya, P. P., Walrand, J. C. and Buyukkoc, C. (1985). Extensions of the multiarmed bandit problem: The discounted case. IEEE Trans. Automat. Control 30 426–439.
  • Wang, L. and Arnold, K. (2002). Press release: Cancer specialists in disagreement about purpose of clinical trials. J. Nat. Cancer Inst. 94 18–19.
  • Wason, J. M. S. and Jaki, T. (2012). Optimal design of multi-arm multi-stage trials. Stat. Med. 31 4269–4279.
  • Wason, J., Magirr, D., Law, M. and Jaki, T. (2012). Some recommendations for multi-arm multi-stage trials. Stat. Methods Med. Res. 1–12.
  • Weber, R. (1992). On the Gittins index for multiarmed bandits. Ann. Appl. Probab. 2 1024–1033.
  • Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Probab. 27 637–648.
  • Whittle, P. (1980). Multi-armed bandits and the Gittins index. J. R. Stat. Soc. Ser. B Stat. Methodol. 42 143–149.
  • Whittle, P. (1981). Arm-acquiring bandits. Ann. Probab. 9 284–292.
  • Whittle, P. (1988). Restless bandits: Activity allocation in a changing world. J. Appl. Probab. 25A 287–298.