The Annals of Applied Probability
- Ann. Appl. Probab.
- Volume 10, Number 3 (2000), 877-896.
Index-based policies for discounted multi-armed bandits on parallel machines
We utilize and develop elements of the recent achievable region account of Gittins indexation by Bertsimas and Niño-Mora to design index-based policies for discounted multi-armed bandits on parallel machines. The policies analyzed have expected rewards which come within an $O(\alpha)$ quantity of optimality, where $\alpha > 0$ is a discount rate. In the main, the policies make an initial once for all allocation of bandits to machines, with each machine then handling its own workload optimally. This allocation must take careful account of the index structure of the bandits. The corresponding limit policies are average-overtaking optimal.
Ann. Appl. Probab., Volume 10, Number 3 (2000), 877-896.
First available in Project Euclid: 22 April 2002
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Primary: 90B36: Scheduling theory, stochastic [See also 68M20]
Secondary: 90C40: Markov and semi-Markov decision processes
Glazebrook, K. D.; Wilkinson, D. J. Index-based policies for discounted multi-armed bandits on parallel machines. Ann. Appl. Probab. 10 (2000), no. 3, 877--896. doi:10.1214/aoap/1019487512. https://projecteuclid.org/euclid.aoap/1019487512