Abstract
This paper studies the sequential decision model known as the two-armed-bandit with finite memory. It was introduced by Robbins [8] in 1956 and studied further by Isbell [5] in 1959. In this paper, a set of rules is defined which are uniformly better than those given in [5] and [8]. A much larger class of rules is then defined, one member of which is conjectured to be a uniformly best rule.
Citation
Carter Vincent Smith. Ronald Pyke. "The Robbins-Isbell Two-Armed-Bandit Problem with Finite Memory." Ann. Math. Statist. 36 (5) 1375 - 1386, October, 1965. https://doi.org/10.1214/aoms/1177699897
Information