Open Access
Translator Disclaimer
September, 1974 A Note on the Bernoulli Two-Armed Bandit Problem
Thomas A. Kelley
Ann. Statist. 2(5): 1056-1062 (September, 1974). DOI: 10.1214/aos/1176342827


Suppose the arms of a two-armed bandit generate i.i.d. Bernoulli random variables with success probabilities $\rho$ and $\lambda$ respectively. It is desired to maximize the expected sum of $N$ trials where $N$ is fixed. If the prior distribution of $(\rho, \lambda)$ is concentrated at two points $(a, b)$ and $(c, d)$ in the unit square, a characterization of the optimal policy is given. In terms of $a, b, c$, and $d$, necessary and sufficient conditions are given for the optimality of the myopic policy.


Download Citation

Thomas A. Kelley. "A Note on the Bernoulli Two-Armed Bandit Problem." Ann. Statist. 2 (5) 1056 - 1062, September, 1974.


Published: September, 1974
First available in Project Euclid: 12 April 2007

zbMATH: 0299.62048
MathSciNet: MR426324
Digital Object Identifier: 10.1214/aos/1176342827

Keywords: 62.35 , 62.45 , Bernoulli random variable , myopic , optimal , posterior distribution , relative advantage , sequential , strategy , two-armed bandit problem , two-point prior distribution

Rights: Copyright © 1974 Institute of Mathematical Statistics


Vol.2 • No. 5 • September, 1974
Back to Top