Open Access
April 2013 The multi-armed bandit problem with covariates
Vianney Perchet, Philippe Rigollet
Ann. Statist. 41(2): 693-721 (April 2013). DOI: 10.1214/13-AOS1101

Abstract

We consider a multi-armed bandit problem in a setting where each arm produces a noisy reward realization which depends on an observable random covariate. As opposed to the traditional static multi-armed bandit problem, this setting allows for dynamically changing rewards that better describe applications where side information is available. We adopt a nonparametric model where the expected rewards are smooth functions of the covariate and where the hardness of the problem is captured by a margin parameter. To maximize the expected cumulative reward, we introduce a policy called Adaptively Binned Successive Elimination (ABSE) that adaptively decomposes the global problem into suitably “localized” static bandit problems. This policy constructs an adaptive partition using a variant of the Successive Elimination (SE) policy. Our results include sharper regret bounds for the SE policy in a static bandit problem and minimax optimal regret bounds for the ABSE policy in the dynamic problem.

Citation

Download Citation

Vianney Perchet. Philippe Rigollet. "The multi-armed bandit problem with covariates." Ann. Statist. 41 (2) 693 - 721, April 2013. https://doi.org/10.1214/13-AOS1101

Information

Published: April 2013
First available in Project Euclid: 26 April 2013

zbMATH: 1360.62436
MathSciNet: MR3099118
Digital Object Identifier: 10.1214/13-AOS1101

Subjects:
Primary: 62G08
Secondary: 62L12

Keywords: adaptive partition , contextual bandit , multi-armed bandit , Nonparametric bandit , regret bounds , Sequential allocation , successive elimination

Rights: Copyright © 2013 Institute of Mathematical Statistics

Vol.41 • No. 2 • April 2013
Back to Top