Continuous Multi-Armed Bandits and Multiparameter Processes

Avi Mandelbaum

doi:10.1214/aop/1176991992

October, 1987 Continuous Multi-Armed Bandits and Multiparameter Processes

Avi Mandelbaum

Ann. Probab. 15(4): 1527-1556 (October, 1987). DOI: 10.1214/aop/1176991992

Abstract

A general framework is proposed for continuous time dynamic allocation models of a scarce resource among competing projects. The allocation model is formulated as a multi-armed bandit model and solved as a control problem of a multiparameter process. In contrast to discrete time bandits, where only one arm can be pulled at a time, the continuous time bandit must allow simultaneous pulls. The multiparameter approach allows a strong solution of diffusion-type bandits. Here the main problem is to define precisely how to switch among arms and the solution involves local times.

Citation

Download Citation

Avi Mandelbaum. "Continuous Multi-Armed Bandits and Multiparameter Processes." Ann. Probab. 15 (4) 1527 - 1556, October, 1987. https://doi.org/10.1214/aop/1176991992

Information

Published: October, 1987

First available in Project Euclid: 19 April 2007

zbMATH: 0657.62098

MathSciNet: MR905347

Digital Object Identifier: 10.1214/aop/1176991992

Subjects:

Primary: 62L99

Secondary: 60G17 , 60J55 , 60J60 , 60K10 , 93E20

Keywords: Diffusions , Dynamic allocation , Gittins' index , Local time , Multi-armed bandits , multiparameter processes , optional increasing path , Stochastic control

Access the abstract

JOURNAL ARTICLE
30 PAGES

DOWNLOAD PDF + SAVE TO MY LIBRARY