Open Access
June, 1972 A Bernoulli Two-armed Bandit
Donald A. Berry
Ann. Math. Statist. 43(3): 871-897 (June, 1972). DOI: 10.1214/aoms/1177692553

Abstract

One of two independent Bernoulli processes (arms) with unknown expectations $\rho$ and $\lambda$ is selected and observed at each of $n$ stages. The selection problem is sequential in that the process which is selected at a particular stage is a function of the results of previous selections as well as of prior information about $\rho$ and $\lambda$. The variables $\rho$ and $\lambda$ are assumed to be independent under the (prior) probability distribution. The objective is to maximize the expected number of successes from the $n$ selections. Sufficient conditions for the optimality of selecting one or the other of the arms are given and illustrated for example distributions. The stay-on-a-winner rule is proved.

Citation

Download Citation

Donald A. Berry. "A Bernoulli Two-armed Bandit." Ann. Math. Statist. 43 (3) 871 - 897, June, 1972. https://doi.org/10.1214/aoms/1177692553

Information

Published: June, 1972
First available in Project Euclid: 27 April 2007

zbMATH: 0258.62013
MathSciNet: MR305531
Digital Object Identifier: 10.1214/aoms/1177692553

Rights: Copyright © 1972 Institute of Mathematical Statistics

Vol.43 • No. 3 • June, 1972
Back to Top