Abstract
We express Gittins indices for multi-armed bandit problems as Laurent expansions around discount factor 1. The coefficients of these expan-sions are then used to characterize stationary optimal policies when the optimality criteria are sensitive-discount optimality (otherwise known as Blackwell optimality), average-reward optimality and average-overtaking optimality. We also obtain bounds and derive optimality conditions for policies of a type that continue playing the same bandit as long as the state of that bandit remains in prescribed sets.
Citation
Michael N. Katehakis. Uriel G. Rothblum. "Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality." Ann. Appl. Probab. 6 (3) 1024 - 1034, August 1996. https://doi.org/10.1214/aoap/1034968239
Information