Open Access
August 2000 Index-based policies for discounted multi-armed bandits on parallel machines
K. D. Glazebrook, D. J. Wilkinson
Ann. Appl. Probab. 10(3): 877-896 (August 2000). DOI: 10.1214/aoap/1019487512

Abstract

We utilize and develop elements of the recent achievable region account of Gittins indexation by Bertsimas and Niño-Mora to design index-based policies for discounted multi-armed bandits on parallel machines. The policies analyzed have expected rewards which come within an $O(\alpha)$ quantity of optimality, where $\alpha > 0$ is a discount rate. In the main, the policies make an initial once for all allocation of bandits to machines, with each machine then handling its own workload optimally. This allocation must take careful account of the index structure of the bandits. The corresponding limit policies are average-overtaking optimal.

Citation

Download Citation

K. D. Glazebrook. D. J. Wilkinson. "Index-based policies for discounted multi-armed bandits on parallel machines." Ann. Appl. Probab. 10 (3) 877 - 896, August 2000. https://doi.org/10.1214/aoap/1019487512

Information

Published: August 2000
First available in Project Euclid: 22 April 2002

zbMATH: 1073.90568
MathSciNet: MR1789982
Digital Object Identifier: 10.1214/aoap/1019487512

Subjects:
Primary: 90B36
Secondary: 90C40

Keywords: Average-overtaking optimal , average-reward optimal , Gittins index , multi-armed bandit problem , parallel machines , suboptimality bound

Rights: Copyright © 2000 Institute of Mathematical Statistics

Vol.10 • No. 3 • August 2000
Back to Top