The Annals of Statistics

Batched bandit problems

Vianney Perchet, Philippe Rigollet, Sylvain Chassang, and Erik Snowberg

Full-text: Open access

Abstract

Motivated by practical applications, chiefly clinical trials, we study the regret achievable for stochastic bandits under the constraint that the employed policy must split trials into a small number of batches. We propose a simple policy, and show that a very small number of batches gives close to minimax optimal regret bounds. As a byproduct, we derive optimal policies with low switching cost for stochastic bandits.

Article information

Source
Ann. Statist., Volume 44, Number 2 (2016), 660-681.

Dates
Received: May 2015
Revised: August 2015
First available in Project Euclid: 17 March 2016

Permanent link to this document
https://projecteuclid.org/euclid.aos/1458245731

Digital Object Identifier
doi:10.1214/15-AOS1381

Mathematical Reviews number (MathSciNet)
MR3476613

Zentralblatt MATH identifier
1338.62180

Subjects
Primary: 62L05: Sequential design
Secondary: 62C20: Minimax procedures

Keywords
Multi-armed bandit problems regret bounds batches multi-phase allocation grouped clinical trials sample size determination switching cost

Citation

Perchet, Vianney; Rigollet, Philippe; Chassang, Sylvain; Snowberg, Erik. Batched bandit problems. Ann. Statist. 44 (2016), no. 2, 660--681. doi:10.1214/15-AOS1381. https://projecteuclid.org/euclid.aos/1458245731


Export citation

References

  • [1] Audibert, J.-Y. and Bubeck, S. (2010). Regret bounds and minimax policies under partial monitoring. J. Mach. Learn. Res. 11 2785–2836.
  • [2] Auer, P., Cesa-Bianchi, N. and Fischer, P. (2002). Finite-time analysis of the multiarmed bandit problem. Mach. Learn. 47 235–256.
  • [3] Auer, P. and Ortner, R. (2010). UCB revisited: Improved regret bounds for the stochastic multi-armed bandit problem. Period. Math. Hungar. 61 55–65.
  • [4] Bartroff, J. (2007). Asymptotically optimal multistage tests of simple hypotheses. Ann. Statist. 35 2075–2105.
  • [5] Bartroff, J., Lai, T. L. and Shih, M.-C. (2013). Sequential Experimentation in Clinical Trials: Design and Analysis. Springer, New York.
  • [6] Bather, J. A. (1981). Randomized allocation of treatments in sequential experiments. J. Roy. Statist. Soc. Ser. B 43 265–292.
  • [7] Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall, London.
  • [8] Bertsimas, D. and Mersereau, A. J. (2007). A learning approach for interactive marketing to a customer segment. Oper. Res. 55 1120–1135.
  • [9] Bubeck, S., Perchet, V. and Rigollet, P. (2013). Bounded regret in stochastic multi-armed bandits. COLT 2013, JMLR W&CP 30 122–134.
  • [10] Cappé, O., Garivier, A., Maillard, O.-A., Munos, R. and Stoltz, G. (2013). Kullback–Leibler upper confidence bounds for optimal sequential allocation. Ann. Statist. 41 1516–1541.
  • [11] Cesa-Bianchi, N., Dekel, O. and Shamir, O. (2013). Online learning with switching costs and other adaptive adversaries. Adv. Neural Inf. Process. Syst. 26 1160–1168.
  • [12] Cesa-Bianchi, N., Gentile, C. and Mansour, Y. (2012). Regret minimization for reserve prices in second-price auctions. In Proceedings of the Twenty-Fourth Annual ACM-SIAM Symposium on Discrete Algorithms 1190–1204. SIAM, Philadelphia, PA.
  • [13] Cheng, Y. (1996). Multistage bandit problems. J. Statist. Plann. Inference 53 153–170.
  • [14] Chick, S. E. and Gans, N. (2009). Economic analysis of simulation selection problems. Manage. Sci. 55 421–437.
  • [15] Colton, T. (1963). A model for selecting one of two medical treatments. J. Amer. Statist. Assoc. 58 388–400.
  • [16] Colton, T. (1965). A two-stage model for selecting one of two treatments. Biometrics 21 169–180.
  • [17] Cottle, R., Johnson, E. and Wets, R. (2007). George B. Dantzig (1914–2005). Notices Amer. Math. Soc. 54 344–362.
  • [18] Dantzig, G. B. (1940). On the non-existence of tests of “Student’s” hypothesis having power functions independent of $\sigma$. Ann. Math. Statist. 11 186–192.
  • [19] Doob, J. L. (1990). Stochastic Processes. Wiley, New York.
  • [20] Fabius, J. and van Zwet, W. R. (1970). Some remarks on the two-armed bandit. Ann. Math. Statist. 41 1906–1916.
  • [21] Ghurye, S. G. and Robbins, H. (1954). Two-stage procedures for estimating the difference between means. Biometrika 41 146–152.
  • [22] Hardwick, J. and Stout, Q. F. (2002). Optimal few-stage designs. J. Statist. Plann. Inference 104 121–145.
  • [23] Jennison, C. and Turnbull, B. W. (2000). Group Sequential Methods with Applications to Clinical Trials. Chapman & Hall/CRC, Boca Raton, FL.
  • [24] Jun, T. (2004). A survey on the bandit problem with switching costs. Economist 152 513–541.
  • [25] Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22.
  • [26] Maurice, R. J. (1957). A minimax procedure for choosing between two populations using sequential sampling. J. R. Stat. Soc., B 19 255–261.
  • [27] Metsch, L. R., Feaster, D. J., Gooden, L. et al. (2013). Effect of risk-reduction counseling with rapid HIV testing on risk of acquiring sexually transmitted infections: The AWARE randomized clinical trial. JAMA 310 1701–1710.
  • [28] Perchet, V. and Rigollet, P. (2013). The multi-armed bandit problem with covariates. Ann. Statist. 41 693–721.
  • [29] Perchet, V., Rigollet, P., Chassang, S. and Snowberg, E. (2015). Supplement to “Batched bandit problems.” DOI:10.1214/15-AOS1381SUPP.
  • [30] Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 527–535.
  • [31] Schwartz, E. M., Bradlow, E. and Fader, P. (2013). Customer acquisition via display advertising using multi-armed bandit experiments. Technical report, Univ. Michigan.
  • [32] Somerville, P. N. (1954). Some problems of optimum sampling. Biometrika 41 420–429.
  • [33] Stein, C. (1945). A two-sample test for a linear hypothesis whose power is independent of the variance. Ann. Math. Statist. 16 243–258.
  • [34] Thompson, W. R. (1933). On the likelihood that one unknown probability exceeds another in view of the evidence of two samples. Biometrika 25 285–294.
  • [35] Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation. Springer, New York.
  • [36] Vogel, W. (1960). An asymptotic minimax theorem for the two armed bandit problem. Ann. Math. Statist. 31 444–451.
  • [37] Vogel, W. (1960). A sequential design for the two armed bandit. Ann. Math. Statist. 31 430–443.

Supplemental materials