Open Access
September, 1979 Bernoulli One-Armed Bandits--Arbitrary Discount Sequences
Donald A. Berry, Bert Fristedt
Ann. Statist. 7(5): 1086-1105 (September, 1979). DOI: 10.1214/aos/1176344792

Abstract

Each of two arms generate an infinite sequence of Bernoulli random variables. At each stage we choose which arm to observe based on past observations. The parameter of the left arm is known; that of the right arm is a random variable. There are two conflicting desiderata: to observe a success at the present stage and to obtain information useful for making future decisions. The payoff is $\alpha_m$ for a success at stage $m$. The objective is to maximize the expected total payoff. If the sequence $(\alpha_1, \alpha_2, \cdots)$ is regular an observation of the left arm should always be followed by another of the left arm. A rather explicit characterization of optimal strategies for regular sequences follows from this result. This characterization generalizes results of Bradt, Johnson, and Karlin (1956) who considered $\alpha_m$ equal to 1 for $m \leqslant n$ and 0 for $m > n$ and of Bellman (1956) who considered $\alpha_m = \alpha^{m-1}$ for $0 \leqslant \alpha < 1$.

Citation

Download Citation

Donald A. Berry. Bert Fristedt. "Bernoulli One-Armed Bandits--Arbitrary Discount Sequences." Ann. Statist. 7 (5) 1086 - 1105, September, 1979. https://doi.org/10.1214/aos/1176344792

Information

Published: September, 1979
First available in Project Euclid: 12 April 2007

zbMATH: 0415.62056
MathSciNet: MR536512
Digital Object Identifier: 10.1214/aos/1176344792

Subjects:
Primary: 62L05
Secondary: 62L15

Keywords: Bernoulli bandit , One-armed bandit , Optimal stopping , regular discounting , sequential decisions , two-armed bandit

Rights: Copyright © 1979 Institute of Mathematical Statistics

Vol.7 • No. 5 • September, 1979
Back to Top