The Annals of Applied Probability

Woodroofe’s one-armed bandit problem revisited

Alexander Goldenshluger and Assaf Zeevi

Full-text: Open access

Abstract

We consider the one-armed bandit problem of Woodroofe [J. Amer. Statist. Assoc. 74 (1979) 799–806], which involves sequential sampling from two populations: one whose characteristics are known, and one which depends on an unknown parameter and incorporates a covariate. The goal is to maximize cumulative expected reward. We study this problem in a minimax setting, and develop rate-optimal polices that involve suitable modifications of the myopic rule. It is shown that the regret, as well as the rate of sampling from the inferior population, can be finite or grow at various rates with the time horizon of the problem, depending on “local” properties of the covariate distribution. Proofs rely on martingale methods and information theoretic arguments.

Article information

Source
Ann. Appl. Probab., Volume 19, Number 4 (2009), 1603-1633.

Dates
First available in Project Euclid: 27 July 2009

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1248700629

Digital Object Identifier
doi:10.1214/08-AAP589

Mathematical Reviews number (MathSciNet)
MR2538082

Zentralblatt MATH identifier
1168.62071

Subjects
Primary: 62L05: Sequential design
Secondary: 60G40: Stopping times; optimal stopping problems; gambling theory [See also 62L15, 91A60] 62C20: Minimax procedures

Keywords
Sequential allocation online learning estimation bandit problems regret inferior sampling rate minimax rate-optimal policy

Citation

Goldenshluger, Alexander; Zeevi, Assaf. Woodroofe’s one-armed bandit problem revisited. Ann. Appl. Probab. 19 (2009), no. 4, 1603--1633. doi:10.1214/08-AAP589. https://projecteuclid.org/euclid.aoap/1248700629


Export citation

References

  • Berry, D. A. and Fristedt, B. (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman & Hall, London.
  • Borovkov, A. A. and Sakhanenko, A. I. (1980). Estimates for averaged quadratic risk. Probab. Math. Statist. 1 185–195.
  • Brown, L. D. and Gajek, L. (1990). Information inequalities for the Bayes risk. Ann. Statist. 18 1578–1594.
  • Cesa-Bianchi, N. and Lugosi, G. (2006). Prediction, Learning, and Games. Cambridge Univ. Press, Cambridge.
  • Clayton, M. K. (1989). Covariate models for Bernoulli bandits. Sequential Anal. 8 405–426.
  • de la Peña, V. H., Klass, M. J. and Lai, T. L. (2004). Self-normalized processes: Exponential inequalities, moment bounds and iterated logarithm laws. Ann. Probab. 32 1902–1933.
  • de la Peña, V. H., Klass, M. J. and Lai, T. L. (2007). Pseudo-maximization and self-normalized processes. Probab. Surv. 4 172–192.
  • Devroye, L. (1987). A Course in Density Estimation. Progress in Probability and Statistics 14. Birkhäuser, Boston.
  • Gill, R. D. and Levit, B. Y. (1995). Applications of the Van Trees inequality: A Bayesian Cramér–Rao bound. Bernoulli 1 59–79.
  • Gittins, J. C. (1989). Multi-armed Bandit Allocation Indices. Wiley, Chichester.
  • Goldenshluger, A. and Zeevi, A. (2008). Performance limitations in bandit problems with side observations. Unpublished manuscript.
  • Lai, T. L. (2001). Sequential analysis: Some classical problems and new challenges. Statist. Sinica 11 303–408.
  • Lai, T. L. and Robbins, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4–22.
  • Liptser, R. and Spokoiny, V. (2000). Deviation probability bound for martingales with applications to statistical estimation. Statist. Probab. Lett. 46 347–357.
  • Mammen, E. and Tsybakov, A. B. (1999). Smooth discrimination analysis. Ann. Statist. 27 1808–1829.
  • Robbins, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 55 527–535.
  • Sarkar, J. (1991). One-armed bandit problems with covariates. Ann. Statist. 19 1978–2002.
  • Tsybakov, A. B. (2004a). Optimal aggregation of classifiers in statistical learning. Ann. Statist. 32 135–166.
  • Tsybakov, A. (2004b). Introduction à l’estimation nonparamétrique. Springer, Berlin.
  • Wang, C.-C., Kulkarni, S. R. and Poor, H. V. (2005). Bandit problems with side observations. IEEE Trans. Automat. Control 50 799–806.
  • Woodroofe, M. (1979). A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74 799–806.
  • Woodroofe, M. (1982). Sequential allocation with covariates. Sankhyā Ser. A 44 403–414.
  • Yang, Y. and Zhu, D. (2002). Randomized allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Ann. Statist. 30 100–121.