Open Access
February 2002 Randomized Allocation with nonparametric estimation for a multi-armed bandit problem with covariates
Yuhong Yang, Dan Zhu
Ann. Statist. 30(1): 100-121 (February 2002). DOI: 10.1214/aos/1015362186

Abstract

We study a multi-armed bandit problem in a setting where covariates are available. We take a nonparametric approach to estimate the functional relationship between the response (reward) and the covariates. The estimated relationships and appropriate randomization are used to select a good arm to play for a greater expected reward. Randomization helps balance the tendency to trust the currently most promising arm with further exploration of other arms. It is shown that, with some familiar nonparametric methods (e.g., histogram), the proposed strategy is strongly consistent in the sense that the accumulated reward is asymptotically equivalent to that based on the best arm (which depends on the covariates) almost surely.

Citation

Download Citation

Yuhong Yang. Dan Zhu. "Randomized Allocation with nonparametric estimation for a multi-armed bandit problem with covariates." Ann. Statist. 30 (1) 100 - 121, February 2002. https://doi.org/10.1214/aos/1015362186

Information

Published: February 2002
First available in Project Euclid: 5 March 2002

zbMATH: 1012.62088
MathSciNet: MR1892657
Digital Object Identifier: 10.1214/aos/1015362186

Subjects:
Primary: 62C25 , 62L05

Keywords: con-comitant variable , Multi-armed bandits , Nonparametric regression , randomized allocation , Sequential allocation

Rights: Copyright © 2002 Institute of Mathematical Statistics

Vol.30 • No. 1 • February 2002
Back to Top