Open Access
2016 Randomized allocation with arm elimination in a bandit problem with covariates
Wei Qian, Yuhong Yang
Electron. J. Statist. 10(1): 242-270 (2016). DOI: 10.1214/15-EJS1104

Abstract

Motivated by applications in personalized web services and clinical research, we consider a multi-armed bandit problem in a setting where the mean reward of each arm is associated with some covariates. A multi-stage randomized allocation with arm elimination algorithm is proposed to combine the flexibility in reward function modeling and a theoretical guarantee of a cumulative regret minimax rate. When the function smoothness parameter is unknown, the algorithm is equipped with a histogram estimation based smoothness parameter selector using Lepski’s method, and is shown to maintain the regret minimax rate up to a logarithmic factor under a “self-similarity” condition.

Citation

Download Citation

Wei Qian. Yuhong Yang. "Randomized allocation with arm elimination in a bandit problem with covariates." Electron. J. Statist. 10 (1) 242 - 270, 2016. https://doi.org/10.1214/15-EJS1104

Information

Received: 1 October 2014; Published: 2016
First available in Project Euclid: 17 February 2016

zbMATH: 1332.62138
MathSciNet: MR3466182
Digital Object Identifier: 10.1214/15-EJS1104

Subjects:
Primary: 62G08
Secondary: 62L05

Keywords: adaptive estimation , Contextual bandit problem , MABC , Nonparametric bandit , regret bound

Rights: Copyright © 2016 The Institute of Mathematical Statistics and the Bernoulli Society

Vol.10 • No. 1 • 2016
Back to Top