The Annals of Statistics

Randomized Allocation with nonparametric estimation for a multi-armed bandit problem with covariates

Yuhong Yang and Dan Zhu

Full-text: Open access


We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.

Article information

Ann. Statist., Volume 30, Number 1 (2002), 100-121.

First available in Project Euclid: 5 March 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62L05: Sequential design 62C25: Compound decision problems

Multi-armed bandits sequential allocation randomized allocation con-comitant variable nonparametric regression


Yang, Yuhong; Zhu, Dan. Randomized Allocation with nonparametric estimation for a multi-armed bandit problem with covariates. Ann. Statist. 30 (2002), no. 1, 100--121. doi:10.1214/aos/1015362186.

Export citation


  • AUER, P., CESA-BIANCHI, N., FREUND, Y. and SCHAPIRE, R. E. (1995). Gambling in a rigged casino: the adversarial multi-armed bandit problem. In 36th Annual Symposium on Foundations of Computer Science 322-331. IEEE Computer Society Press, Los Alamitos, CA.
  • BERRY, D. A., CHEN, R. W., ZAME, A., HEATH, D. C. and SHEPP, L. A. (1997). Bandit problems with infinitely many arms. Ann. Statist. 25 2103-2116.
  • BERRY, D. A. and FRISTEDT, B. (1985). Bandit Problems: Sequential Allocation of Experiments. Chapman and Hall, New York.
  • BIRGÉ, L. and MASSART, P. (1998). Minimum contrast estimators on sieves: exponential bounds and rates of convergence. Bernoulli 4 329-375.
  • CLAYTON, M. K. (1989). Covariate models for Bernoulli bandits. Sequential Anal. 8 405-426.
  • DEVROYE, L. and GYÖRFI, L. (1985). Distribution-free exponential bounds on the l1 error of partitioning estimates of a regression function. In Proceedings of the Fourth Pannonian Symposium on Mathematical Statistics (F. Konecny, J. Mogyoródi and W. Wertz, eds.) 67-76. Akadémiai Kiadó, Budapest.
  • DEVROYE, L., GYÖRFI, L., KRZY ZAK, A. and LUGOSI, G. (1994). On the strong universal consistency of nearest neighbor regression function estimates. Ann. Statist. 22 1371-1385.
  • DEVROYE, L., GYÖRFI, L. and LUGOSI, G. (1996). A Probabilistic Theory of Pattern Recognition. Springer, New York.
  • FAN, J. and GIJBELS, I. (1996). Local Polynomial Modeling and Its Applications. Chapman and Hall, New York.
  • GITTINS, J. C. (1989). Multi-armed Bandit Allocation Indices. Wiley, New York.
  • GRATCH, J., DEJONG, G. and YANG, Y. (1994). Rational learning: finding a balance between utility and efficiency. Selecting Models from Data: Artificial Intelligence and Statistics. Lecture Notes in Statist. 89 11-20. Springer, New York.
  • LAI, T. L. and ROBBINS, H. (1985). Asymptotically efficient adaptive allocation rules. Adv. in Appl. Math. 6 4-22.
  • LAI, T. L. and YAKOWITZ, S. (1995). Machine learning and nonparametric bandit theory. IEEE Trans. Automat. Control 40 1199-1209.
  • NOBEL, A. (1996). Histogram regression estimation using data-dependent partitions. Ann. Statist. 24 1084-1105.
  • POLLARD, D. (1984). Convergence of Stochastic Processes. Springer, New York.
  • ROBBINS, H. (1952). Some aspects of the sequential design of experiments. Bull. Amer. Math. Soc. 58 527-535.
  • SARKAR, J. (1991). One-armed bandit problems with covariates. Ann. Statist. 19 1978-2002.
  • STONE, C. S. (1977). Consistent nonparametric regression. Ann. Statist. 5 595-620.
  • VAN DER VAART, A. W. and WELLNER, J. A. (1996). Weak Convergence and Empirical Processes: With Applications to Statistics. Springer, New York.
  • WOODROOFE, M. (1979). A one-armed bandit problem with a concomitant variable. J. Amer. Statist. Assoc. 74 799-806.