Open Access
September, 1987 Adaptive Treatment Allocation and the Multi-Armed Bandit Problem
Tze Leung Lai
Ann. Statist. 15(3): 1091-1114 (September, 1987). DOI: 10.1214/aos/1176350495

Abstract

A class of simple adaptive allocation rules is proposed for the problem (often called the "multi-armed bandit problem") of sampling $x_1, \cdots x_N$ sequentially from $k$ populations with densities belonging to an exponential family, in order to maximize the expected value of the sum $S_N = x_1 + \cdots + x_N$. These allocation rules are based on certain upper confidence bounds, which are developed from boundary crossing theory, for the $k$ population parameters. The rules are shown to be asymptotically optimal as $N \rightarrow \infty$ from both Bayesian and frequentist points of view. Monte Carlo studies show that they also perform very well for moderate values of the horizon $N$.

Citation

Download Citation

Tze Leung Lai. "Adaptive Treatment Allocation and the Multi-Armed Bandit Problem." Ann. Statist. 15 (3) 1091 - 1114, September, 1987. https://doi.org/10.1214/aos/1176350495

Information

Published: September, 1987
First available in Project Euclid: 12 April 2007

zbMATH: 0643.62054
MathSciNet: MR902248
Digital Object Identifier: 10.1214/aos/1176350495

Subjects:
Primary: 62L05
Secondary: 60G40 , 62L12

Keywords: adaptive control , boundary crossings , Dynamic allocation , Sequential experimentation , upper confidence bounds

Rights: Copyright © 1987 Institute of Mathematical Statistics

Vol.15 • No. 3 • September, 1987
Back to Top