The Annals of Probability

Arm-Acquiring Bandits

P. Whittle
Source: Ann. Probab. Volume 9, Number 2 (1981), 284-292.

Abstract

We consider the problem of allocating effort between projects at different stages of development when new projects are also continually appearing. An expression (14) is derived for the expected reward yielded by the Gittins index policy. This is shown to satisfy the dynamic programming equation for the problem, so confirming optimality of the policy.

First Page: Show Hide
Primary Subjects: 42C99
Secondary Subjects: 62C99
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aop/1176994469
JSTOR: links.jstor.org
Digital Object Identifier: doi:10.1214/aop/1176994469
Mathematical Reviews number (MathSciNet): MR606990
Zentralblatt MATH identifier: 0464.90081


2013 © Institute of Mathematical Statistics

The Annals of Probability

The Annals of Probability

Turn MathJax Off
What is MathJax?