Open Access
February 2020 The multi-armed bandit problem: An efficient nonparametric solution
Hock Peng Chan
Ann. Statist. 48(1): 346-373 (February 2020). DOI: 10.1214/19-AOS1809

Abstract

Lai and Robbins (Adv. in Appl. Math. 6 (1985) 4–22) and Lai (Ann. Statist. 15 (1987) 1091–1114) provided efficient parametric solutions to the multi-armed bandit problem, showing that arm allocation via upper confidence bounds (UCB) achieves minimum regret. These bounds are constructed from the Kullback–Leibler information of the reward distributions, estimated from specified parametric families. In recent years, there has been renewed interest in the multi-armed bandit problem due to new applications in machine learning algorithms and data analytics. Nonparametric arm allocation procedures like $\epsilon $-greedy, Boltzmann exploration and BESA were studied, and modified versions of the UCB procedure were also analyzed under nonparametric settings. However, unlike UCB these nonparametric procedures are not efficient under general parametric settings. In this paper, we propose efficient nonparametric procedures.

Citation

Download Citation

Hock Peng Chan. "The multi-armed bandit problem: An efficient nonparametric solution." Ann. Statist. 48 (1) 346 - 373, February 2020. https://doi.org/10.1214/19-AOS1809

Information

Received: 1 September 2017; Revised: 1 December 2018; Published: February 2020
First available in Project Euclid: 17 February 2020

zbMATH: 07196542
MathSciNet: MR4065165
Digital Object Identifier: 10.1214/19-AOS1809

Subjects:
Primary: 62L05

Keywords: efficiency , KL-UCB , subsampling , Thompson sampling , UCB

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.48 • No. 1 • February 2020
Back to Top