The Annals of Statistics

Adaptive Treatment Allocation and the Multi-Armed Bandit Problem

Tze Leung Lai

Full-text: Open access

Abstract

A class of simple adaptive allocation rules is proposed for the problem (often called the "multi-armed bandit problem") of sampling $x_1, \cdots x_N$ sequentially from $k$ populations with densities belonging to an exponential family, in order to maximize the expected value of the sum $S_N = x_1 + \cdots + x_N$. These allocation rules are based on certain upper confidence bounds, which are developed from boundary crossing theory, for the $k$ population parameters. The rules are shown to be asymptotically optimal as $N \rightarrow \infty$ from both Bayesian and frequentist points of view. Monte Carlo studies show that they also perform very well for moderate values of the horizon $N$.

Article information

Source
Ann. Statist., Volume 15, Number 3 (1987), 1091-1114.

Dates
First available in Project Euclid: 12 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1176350495

Digital Object Identifier
doi:10.1214/aos/1176350495

Mathematical Reviews number (MathSciNet)
MR902248

Zentralblatt MATH identifier
0643.62054

JSTOR
links.jstor.org

Subjects
Primary: 62L05: Sequential design
Secondary: 60G40: Stopping times; optimal stopping problems; gambling theory [See also 62L15, 91A60] 62L12: Sequential estimation

Keywords
Sequential experimentation adaptive control dynamic allocation boundary crossings upper confidence bounds

Citation

Lai, Tze Leung. Adaptive Treatment Allocation and the Multi-Armed Bandit Problem. Ann. Statist. 15 (1987), no. 3, 1091--1114. doi:10.1214/aos/1176350495. https://projecteuclid.org/euclid.aos/1176350495


Export citation