Open Access
August 1996 Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality
Michael N. Katehakis, Uriel G. Rothblum
Ann. Appl. Probab. 6(3): 1024-1034 (August 1996). DOI: 10.1214/aoap/1034968239

Abstract

We express Gittins indices for multi-armed bandit problems as Laurent expansions around discount factor 1. The coefficients of these expan-sions are then used to characterize stationary optimal policies when the optimality criteria are sensitive-discount optimality (otherwise known as Blackwell optimality), average-reward optimality and average-overtaking optimality. We also obtain bounds and derive optimality conditions for policies of a type that continue playing the same bandit as long as the state of that bandit remains in prescribed sets.

Citation

Download Citation

Michael N. Katehakis. Uriel G. Rothblum. "Finite state multi-armed bandit problems: sensitive-discount, average-reward and average-overtaking optimality." Ann. Appl. Probab. 6 (3) 1024 - 1034, August 1996. https://doi.org/10.1214/aoap/1034968239

Information

Published: August 1996
First available in Project Euclid: 18 October 2002

zbMATH: 0862.90127
MathSciNet: MR1410127
Digital Object Identifier: 10.1214/aoap/1034968239

Subjects:
Primary: 60G40 , 90C31 , 90C39 , 90C47

Keywords: bandit problems , Gittins index , Laurent expansions , Markov decision chains , optimality criteria

Rights: Copyright © 1996 Institute of Mathematical Statistics

Vol.6 • No. 3 • August 1996
Back to Top