This paper is concerned with the problem of finding Bayes sequential designs for successively choosing between two given Bernoulli variables so as to maximize the total discounted expected sum. Simple hypotheses concerning the success probabilities are assumed and dynamic programming methods are used to characterize optimal designs. Explicit solutions are described for certain special cases.
"A Note on Discounted Future Two-Armed Bandits." Ann. Statist. 11 (2) 707 - 711, June, 1983. https://doi.org/10.1214/aos/1176346176