The Annals of Statistics

Bernoulli One-Armed Bandits--Arbitrary Discount Sequences

Donald A. Berry and Bert Fristedt

Full-text: Open access

Abstract

Each of two arms generate an infinite sequence of Bernoulli random variables. At each stage we choose which arm to observe based on past observations. The parameter of the left arm is known; that of the right arm is a random variable. There are two conflicting desiderata: to observe a success at the present stage and to obtain information useful for making future decisions. The payoff is $\alpha_m$ for a success at stage $m$. The objective is to maximize the expected total payoff. If the sequence $(\alpha_1, \alpha_2, \cdots)$ is regular an observation of the left arm should always be followed by another of the left arm. A rather explicit characterization of optimal strategies for regular sequences follows from this result. This characterization generalizes results of Bradt, Johnson, and Karlin (1956) who considered $\alpha_m$ equal to 1 for $m \leqslant n$ and 0 for $m > n$ and of Bellman (1956) who considered $\alpha_m = \alpha^{m-1}$ for $0 \leqslant \alpha < 1$.

Article information

Source
Ann. Statist., Volume 7, Number 5 (1979), 1086-1105.

Dates
First available in Project Euclid: 12 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1176344792

Digital Object Identifier
doi:10.1214/aos/1176344792

Mathematical Reviews number (MathSciNet)
MR536512

Zentralblatt MATH identifier
0415.62056

JSTOR
links.jstor.org

Subjects
Primary: 62L05: Sequential design
Secondary: 62L15: Optimal stopping [See also 60G40, 91A60]

Keywords
One-armed bandit sequential decisions optimal stopping two-armed bandit regular discounting Bernoulli bandit

Citation

Berry, Donald A.; Fristedt, Bert. Bernoulli One-Armed Bandits--Arbitrary Discount Sequences. Ann. Statist. 7 (1979), no. 5, 1086--1105. doi:10.1214/aos/1176344792. https://projecteuclid.org/euclid.aos/1176344792


Export citation