The Annals of Statistics

Batched bandit problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, and Erik Snowberg

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Article information

Ann. Statist. Volume 44, Number 2 (2016), 660-681.

Received: May 2015
Revised: August 2015
First available in Project Euclid: 17 March 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62L05: Sequential design
Secondary: 62C20: Minimax procedures

Multi-armed bandit problems regret bounds batches multi-phase allocation grouped clinical trials sample size determination switching cost


Perchet, Vianney; Rigollet, Philippe; Chassang, Sylvain; Snowberg, Erik. Batched bandit problems. Ann. Statist. 44 (2016), no. 2, 660--681. doi:10.1214/15-AOS1381.

Export citation


  • [1] Audibert, J.-Y. and Bubeck, S. (2010). Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 2785–2836.
  • [2] Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 235–256.
  • [3] Auer, P. and Ortner, R. (2010). UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Period. Math. Hungar. 61 55–65.
  • [4] Bartroff, J. (2007). Asymptotically optimal multistage tests of simple hypotheses. Ann. Statist. 35 2075–2105.
  • [5] Bartroff, J., Lai, T. L. and Shih, M.-C. (2013). Sequential Experimentation in Clinical Trials: Design and Analysis. Springer, New York.
  • [6] Bather, J. A. (1981). Randomized allocation of treatments in sequential experiments. J. Roy. Statist. Soc. Ser. B 43 265–292.
  • [7] Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall, London.
  • [8] Bertsimas, D. and Mersereau, A. J. (2007). A learning approach for interactive marketing to a customer segment. Oper. Res. 55 1120–1135.
  • [9] Bubeck, S., Perchet, V. and Rigollet, P. (2013). Bounded regret in stochastic multi-armed bandits. COLT 2013, JMLR W&CP 30 122–134.
  • [10] Cappé, O., Garivier, A., Maillard, O.-A., Munos, R. and Stoltz, G. (2013). Kullback–Leibler upper confidence bounds for optimal sequential allocation. Ann. Statist. 41 1516–1541.
  • [11] Cesa-Bianchi, N., Dekel, O. and Shamir, O. (2013). Online learning with switching costs and other adaptive adversaries. Adv. Neural Inf. Process. Syst. 26 1160–1168.
  • [12] Cesa-Bianchi, N., Gentile, C. and Mansour, Y. (2012). Regret minimization for reserve prices in second-price auctions. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms 1190–1204. SIAM, Philadelphia, PA.
  • [13] Cheng, Y. (1996). Multistage bandit problems. J. Statist. Plann. Inference 53 153–170.
  • [14] Chick, S. E. and Gans, N. (2009). Economic analysis of simulation selection problems. Manage. Sci. 55 421–437.
  • [15] Colton, T. (1963). A model for selecting one of two medical treatments. J. Amer. Statist. Assoc. 58 388–400.
  • [16] Colton, T. (1965). A two-stage model for selecting one of two treatments. Biometrics 21 169–180.
  • [17] Cottle, R., Johnson, E. and Wets, R. (2007). George B. Dantzig (1914–2005). Notices Amer. Math. Soc. 54 344–362.
  • [18] Dantzig, G. B. (1940). On the non-existence of tests of “Student’s” hypothesis having power functions independent of $\sigma$. Ann. Math. Statist. 11 186–192.
  • [19] Doob, J. L. (1990). Stochastic Processes. Wiley, New York.
  • [20] Fabius, J. and van Zwet, W. R. (1970). Some remarks on the two-armed bandit. Ann. Math. Statist. 41 1906–1916.
  • [21] Ghurye, S. G. and Robbins, H. (1954). Two-stage procedures for estimating the difference between means. Biometrika 41 146–152.
  • [22] Hardwick, J. and Stout, Q. F. (2002). Optimal few-stage designs. J. Statist. Plann. Inference 104 121–145.
  • [23] Jennison, C. and Turnbull, B. W. (2000). Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC, Boca Raton, FL.
  • [24] Jun, T. (2004). A survey on the bandit problem with switching costs. Economist 152 513–541.
  • [25] Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22.
  • [26] Maurice, R. J. (1957). A minimax procedure for choosing between two populations using sequential sampling. J. R. Stat. Soc., B 19 255–261.
  • [27] Metsch, L. R., Feaster, D. J., Gooden, L. et al. (2013). Effect of risk-reduction counseling with rapid HIV testing on risk of acquiring sexually transmitted infections: The AWARE randomized clinical trial. JAMA 310 1701–1710.
  • [28] Perchet, V. and Rigollet, P. (2013). The multi-armed bandit problem with covariates. Ann. Statist. 41 693–721.
  • [29] Perchet, V., Rigollet, P., Chassang, S. and Snowberg, E. (2015). Supplement to “Batched bandit problems.” DOI:10.1214/15-AOS1381SUPP.
  • [30] Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 527–535.
  • [31] Schwartz, E. M., Bradlow, E. and Fader, P. (2013). Customer acquisition via display advertising using multi-armed bandit experiments. Technical report, Univ. Michigan.
  • [32] Somerville, P. N. (1954). Some problems of optimum sampling. Biometrika 41 420–429.
  • [33] Stein, C. (1945). A two-sample test for a linear hypothesis whose power is independent of the variance. Ann. Math. Statist. 16 243–258.
  • [34] Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 285–294.
  • [35] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • [36] Vogel, W. (1960). An asymptotic minimax theorem for the two armed bandit problem. Ann. Math. Statist. 31 444–451.
  • [37] Vogel, W. (1960). A sequential design for the two armed bandit. Ann. Math. Statist. 31 430–443.

Supplemental materials

  • Supplement to “Batched bandit problems”. The supplementary material [29] contains additional simulations, including some using real data.