Arm-Acquiring Bandits
P. Whittle
Source: Ann. Probab. Volume 9, Number 2
(1981), 284-292.
Abstract
We consider the problem of allocating effort between projects at different stages of development when new projects are also continually appearing. An expression (14) is derived for the expected reward yielded by the Gittins index policy. This is shown to satisfy the dynamic programming equation for the problem, so confirming optimality of the policy.
First Page:
Show
Hide
Full-text: Open access
Links and Identifiers
Permanent link to this document: http://projecteuclid.org/euclid.aop/1176994469
JSTOR: links.jstor.org
Digital Object Identifier: doi:10.1214/aop/1176994469
Mathematical Reviews number (MathSciNet): MR606990
Zentralblatt MATH identifier: 0464.90081