The Annals of Applied Probability

On the Gittins Index for Multiarmed Bandits

Richard Weber

Full-text: Open access

Abstract

This paper considers the multiarmed bandit problem and presents a new proof of the optimality of the Gittins index policy. The proof is intuitive and does not require an interchange argument. The insight it affords is used to give a streamlined summary of previous research and to prove a new result: The optimal value function is a submodular set function of the available projects.

Article information

Source
Ann. Appl. Probab. Volume 2, Number 4 (1992), 1024-1033.

Dates
First available in Project Euclid: 19 April 2007

Permanent link to this document
http://projecteuclid.org/euclid.aoap/1177005588

Digital Object Identifier
doi:10.1214/aoap/1177005588

Mathematical Reviews number (MathSciNet)
MR1189430

Zentralblatt MATH identifier
0763.60021

JSTOR
links.jstor.org

Subjects
Primary: 60G40: Stopping times; optimal stopping problems; gambling theory [See also 62L15, 91A60]
Secondary: 90B35: Scheduling theory, deterministic [See also 68M20] 62L05: Sequential design 90C40: Markov and semi-Markov decision processes

Keywords
Multiarmed bandit problem stochastic scheduling Markov decision processes optimal stopping sequential methods

Citation

Weber, Richard. On the Gittins Index for Multiarmed Bandits. Ann. Appl. Probab. 2 (1992), no. 4, 1024--1033. doi:10.1214/aoap/1177005588. http://projecteuclid.org/euclid.aoap/1177005588.


Export citation