This paper studies the sequential decision model known as the two-armed-bandit with finite memory. It was introduced by Robbins  in 1956 and studied further by Isbell  in 1959. In this paper, a set of rules is defined which are uniformly better than those given in  and . A much larger class of rules is then defined, one member of which is conjectured to be a uniformly best rule.
"The Robbins-Isbell Two-Armed-Bandit Problem with Finite Memory." Ann. Math. Statist. 36 (5) 1375 - 1386, October, 1965. https://doi.org/10.1214/aoms/1177699897