The Annals of Applied Probability

Index-based policies for discounted multi-armed bandits on parallel machines

K. D. Glazebrook and D. J. Wilkinson

Full-text: Open access


We utilize and develop elements of the recent achievable region account of Gittins indexation by Bertsimas and Niño-Mora to design index-based policies for discounted multi-armed bandits on parallel machines. The policies analyzed have expected rewards which come within an $O(\alpha)$ quantity of optimality, where $\alpha > 0$ is a discount rate. In the main, the policies make an initial once for all allocation of bandits to machines, with each machine then handling its own workload optimally. This allocation must take careful account of the index structure of the bandits. The corresponding limit policies are average-overtaking optimal.

Article information

Ann. Appl. Probab., Volume 10, Number 3 (2000), 877-896.

First available in Project Euclid: 22 April 2002

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 90B36: Scheduling theory, stochastic [See also 68M20]
Secondary: 90C40: Markov and semi-Markov decision processes

Average-overtaking optimal average-reward optimal Gittins index multi-armed bandit problem parallel machines suboptimality bound


Glazebrook, K. D.; Wilkinson, D. J. Index-based policies for discounted multi-armed bandits on parallel machines. Ann. Appl. Probab. 10 (2000), no. 3, 877--896. doi:10.1214/aoap/1019487512.

Export citation


  • Bertsimas, D. and Ni no-Mora J. (1996). Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21 257-306.
  • Blackwell D. (1962). Discrete dynamic programming. Ann. Math. Stat. 33 719-726.
  • Coffman, E. and Mitrani, I. (1980). A characterization of waiting time performance realizable by single server queues. Oper. Res. 28 810-821.
  • Denardo, E. V. and Miller, B. L. (1968). An optimality criterion for discrete dynamic programming with no discounting, Ann. Math. Stat. 39 1220-1227.
  • Gittins, J. C. and Jones, D. M. (1974). A dynamicallocation index for the sequential design of experiments. In Progress in Statistics: European Meeting of Statisticians, Budapest, 1972 (J. Gani, K. Sarkadi and I. Vince, eds.) 241-266. North-Holland, Amsterdam. Glazebrook, K. D. (1976) Stochastic scheduling. Ph.D. thesis, Cambridge Univ.
  • Glazebrook, K. D. and Garbe, R. (1999). Almost optimal policies for stochastic systems which almost satisfy conservation laws. Ann. Oper. Res. 92 19-43.
  • Shanthikumar, J. G. and Yao, D. D. (1992). Multi-class queueing systems: polymatroidal structure and optimal scheduling control. Oper. Res. 40 293-299.
  • Veinott, Jr, A. F. (1966). On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Stat. 37 1284-1294. Weber, R. R. (1982) Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flowtime. J. Appl. Probab. 19 167-182. Weber, R. R., Varaiya, P. and Walrand, J. (1986) Scheduling jobs with stochastically ordered processing times on parallel machines to minimize expected flowtime. J. Appl. Probab. 23 841-847. Weiss, G. (1990) Approximation results in parallel machines stochastic scheduling. Ann. Oper. Res. Special Volume on Production Planning and Scheduling (M. Queyranne, ed.) 26 195-242. Weiss, G. (1992) Turnpike optimality of Smith's rule in parallel machines stochastic scheduling. Math. Oper. Res. 17 255-270. Weiss, G. (1995) On almost optimal priority rules for preemptive scheduling of stochastic jobs on parallel machines. Adv. Appl. Probab. 27 821-839.