## The Annals of Mathematical Statistics

- Ann. Math. Statist.
- Volume 43, Number 3 (1972), 871-897.

### A Bernoulli Two-armed Bandit

#### Abstract

One of two independent Bernoulli processes (arms) with unknown expectations $\rho$ and $\lambda$ is selected and observed at each of $n$ stages. The selection problem is sequential in that the process which is selected at a particular stage is a function of the results of previous selections as well as of prior information about $\rho$ and $\lambda$. The variables $\rho$ and $\lambda$ are assumed to be independent under the (prior) probability distribution. The objective is to maximize the expected number of successes from the $n$ selections. Sufficient conditions for the optimality of selecting one or the other of the arms are given and illustrated for example distributions. The stay-on-a-winner rule is proved.

#### Article information

**Source**

Ann. Math. Statist., Volume 43, Number 3 (1972), 871-897.

**Dates**

First available in Project Euclid: 27 April 2007

**Permanent link to this document**

https://projecteuclid.org/euclid.aoms/1177692553

**Digital Object Identifier**

doi:10.1214/aoms/1177692553

**Mathematical Reviews number (MathSciNet)**

MR305531

**Zentralblatt MATH identifier**

0258.62013

**JSTOR**

links.jstor.org

#### Citation

Berry, Donald A. A Bernoulli Two-armed Bandit. Ann. Math. Statist. 43 (1972), no. 3, 871--897. doi:10.1214/aoms/1177692553. https://projecteuclid.org/euclid.aoms/1177692553