The Annals of Probability

Arm-Acquiring Bandits

P. Whittle

Full-text: Open access

Abstract

We consider the problem of allocating effort between projects at different stages of development when new projects are also continually appearing. An expression (14) is derived for the expected reward yielded by the Gittins index policy. This is shown to satisfy the dynamic programming equation for the problem, so confirming optimality of the policy.

Article information

Source
Ann. Probab. Volume 9, Number 2 (1981), 284-292.

Dates
First available in Project Euclid: 19 April 2007

Permanent link to this document
http://projecteuclid.org/euclid.aop/1176994469

JSTOR
links.jstor.org

Digital Object Identifier
doi:10.1214/aop/1176994469

Mathematical Reviews number (MathSciNet)
MR606990

Zentralblatt MATH identifier
0464.90081

Subjects
Primary: 42C99: None of the above, but in this section
Secondary: 62C99: None of the above, but in this section

Keywords
Multiarmed bandit dynamic programming allocation index

Citation

Whittle, P. Arm-Acquiring Bandits. The Annals of Probability 9 (1981), no. 2, 284--292. doi:10.1214/aop/1176994469. http://projecteuclid.org/euclid.aop/1176994469.


Export citation