The Annals of Applied Probability
- Ann. Appl. Probab.
- Volume 6, Number 3 (1996), 1024-1034.
Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality
We express Gittins indices for multi-armed bandit problems as Laurent expansions around discount factor 1. The coefficients of these expan-sions are then used to characterize stationary optimal policies when the optimality criteria are sensitive-discount optimality (otherwise known as Blackwell optimality), average-reward optimality and average-overtaking optimality. We also obtain bounds and derive optimality conditions for policies of a type that continue playing the same bandit as long as the state of that bandit remains in prescribed sets.
Ann. Appl. Probab., Volume 6, Number 3 (1996), 1024-1034.
First available in Project Euclid: 18 October 2002
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Primary: 90C47: Minimax problems [See also 49K35] 90C31: Sensitivity, stability, parametric optimization 90C39: Dynamic programming [See also 49L20] 60G40: Stopping times; optimal stopping problems; gambling theory [See also 62L15, 91A60]
Katehakis, Michael N.; Rothblum, Uriel G. Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality. Ann. Appl. Probab. 6 (1996), no. 3, 1024--1034. doi:10.1214/aoap/1034968239. https://projecteuclid.org/euclid.aoap/1034968239