Open Access
November, 1992 On the Gittins Index for Multiarmed Bandits
Richard Weber
Ann. Appl. Probab. 2(4): 1024-1033 (November, 1992). DOI: 10.1214/aoap/1177005588

Abstract

This paper considers the multiarmed bandit problem and presents a new proof of the optimality of the Gittins index policy. The proof is intuitive and does not require an interchange argument. The insight it affords is used to give a streamlined summary of previous research and to prove a new result: The optimal value function is a submodular set function of the available projects.

Citation

Download Citation

Richard Weber. "On the Gittins Index for Multiarmed Bandits." Ann. Appl. Probab. 2 (4) 1024 - 1033, November, 1992. https://doi.org/10.1214/aoap/1177005588

Information

Published: November, 1992
First available in Project Euclid: 19 April 2007

zbMATH: 0763.60021
MathSciNet: MR1189430
Digital Object Identifier: 10.1214/aoap/1177005588

Subjects:
Primary: 60G40
Secondary: 62L05 , 90B35 , 90C40

Keywords: Markov decision processes , Multiarmed bandit problem , Optimal stopping , sequential methods , stochastic scheduling

Rights: Copyright © 1992 Institute of Mathematical Statistics

Vol.2 • No. 4 • November, 1992
Back to Top