Abstract
Motivated by applications in personalized web services and clinical research, we consider a multi-armed bandit problem in a setting where the mean reward of each arm is associated with some covariates. A multi-stage randomized allocation with arm elimination algorithm is proposed to combine the flexibility in reward function modeling and a theoretical guarantee of a cumulative regret minimax rate. When the function smoothness parameter is unknown, the algorithm is equipped with a histogram estimation based smoothness parameter selector using Lepski’s method, and is shown to maintain the regret minimax rate up to a logarithmic factor under a “self-similarity” condition.
Citation
Wei Qian. Yuhong Yang. "Randomized allocation with arm elimination in a bandit problem with covariates." Electron. J. Statist. 10 (1) 242 - 270, 2016. https://doi.org/10.1214/15-EJS1104
Information