## The Annals of Applied Probability

### Index-based policies for discounted multi-armed bandits on parallel machines

#### Abstract

We utilize and develop elements of the recent achievable region account of Gittins indexation by Bertsimas and Niño-Mora to design index-based policies for discounted multi-armed bandits on parallel machines. The policies analyzed have expected rewards which come within an $O(\alpha)$ quantity of optimality, where $\alpha > 0$ is a discount rate. In the main, the policies make an initial once for all allocation of bandits to machines, with each machine then handling its own workload optimally. This allocation must take careful account of the index structure of the bandits. The corresponding limit policies are average-overtaking optimal.

#### Article information

Source
Ann. Appl. Probab., Volume 10, Number 3 (2000), 877-896.

Dates
First available in Project Euclid: 22 April 2002

https://projecteuclid.org/euclid.aoap/1019487512

Digital Object Identifier
doi:10.1214/aoap/1019487512

Mathematical Reviews number (MathSciNet)
MR1789982

Zentralblatt MATH identifier
1073.90568

Subjects
Secondary: 90C40: Markov and semi-Markov decision processes

#### Citation

Glazebrook, K. D.; Wilkinson, D. J. Index-based policies for discounted multi-armed bandits on parallel machines. Ann. Appl. Probab. 10 (2000), no. 3, 877--896. doi:10.1214/aoap/1019487512. https://projecteuclid.org/euclid.aoap/1019487512

#### References

• Bertsimas, D. and Ni no-Mora J. (1996). Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Oper. Res. 21 257-306.
• Blackwell D. (1962). Discrete dynamic programming. Ann. Math. Stat. 33 719-726.
• Coffman, E. and Mitrani, I. (1980). A characterization of waiting time performance realizable by single server queues. Oper. Res. 28 810-821.
• Denardo, E. V. and Miller, B. L. (1968). An optimality criterion for discrete dynamic programming with no discounting, Ann. Math. Stat. 39 1220-1227.
• Gittins, J. C. and Jones, D. M. (1974). A dynamicallocation index for the sequential design of experiments. In Progress in Statistics: European Meeting of Statisticians, Budapest, 1972 (J. Gani, K. Sarkadi and I. Vince, eds.) 241-266. North-Holland, Amsterdam. Glazebrook, K. D. (1976) Stochastic scheduling. Ph.D. thesis, Cambridge Univ.
• Glazebrook, K. D. and Garbe, R. (1999). Almost optimal policies for stochastic systems which almost satisfy conservation laws. Ann. Oper. Res. 92 19-43.
• Shanthikumar, J. G. and Yao, D. D. (1992). Multi-class queueing systems: polymatroidal structure and optimal scheduling control. Oper. Res. 40 293-299.
• Veinott, Jr, A. F. (1966). On finding optimal policies in discrete dynamic programming with no discounting. Ann. Math. Stat. 37 1284-1294. Weber, R. R. (1982) Scheduling jobs with stochastic processing requirements on parallel machines to minimize makespan or flowtime. J. Appl. Probab. 19 167-182. Weber, R. R., Varaiya, P. and Walrand, J. (1986) Scheduling jobs with stochastically ordered processing times on parallel machines to minimize expected flowtime. J. Appl. Probab. 23 841-847. Weiss, G. (1990) Approximation results in parallel machines stochastic scheduling. Ann. Oper. Res. Special Volume on Production Planning and Scheduling (M. Queyranne, ed.) 26 195-242. Weiss, G. (1992) Turnpike optimality of Smith's rule in parallel machines stochastic scheduling. Math. Oper. Res. 17 255-270. Weiss, G. (1995) On almost optimal priority rules for preemptive scheduling of stochastic jobs on parallel machines. Adv. Appl. Probab. 27 821-839.