The Annals of Applied Probability

On the Gittins Index for Multiarmed Bandits

Richard Weber

Full-text: Open access

Abstract

This paper considers the multiarmed bandit problem and presents a new proof of the optimality of the Gittins index policy. The proof is intuitive and does not require an interchange argument. The insight it affords is used to give a streamlined summary of previous research and to prove a new result: The optimal value function is a submodular set function of the available projects.

Article information

Source
Ann. Appl. Probab. Volume 2, Number 4 (1992), 1024-1033.

Dates
First available: 19 April 2007

Permanent link to this document
http://projecteuclid.org/euclid.aoap/1177005588

JSTOR
links.jstor.org

Digital Object Identifier
doi:10.1214/aoap/1177005588

Mathematical Reviews number (MathSciNet)
MR1189430

Zentralblatt MATH identifier
0763.60021

Subjects
Primary: 60G40: Stopping times; optimal stopping problems; gambling theory [See also 62L15, 91A60]
Secondary: 90B35: Scheduling theory, deterministic [See also 68M20] 62L05: Sequential design 90C40: Markov and semi-Markov decision processes

Keywords
Multiarmed bandit problem stochastic scheduling Markov decision processes optimal stopping sequential methods

Citation

Weber, Richard. On the Gittins Index for Multiarmed Bandits. The Annals of Applied Probability 2 (1992), no. 4, 1024--1033. doi:10.1214/aoap/1177005588. http://projecteuclid.org/euclid.aoap/1177005588.


Export citation