Advances in Applied Probability

On the asymptotic optimality of greedy index heuristics for multi-action restless bandits

D. J. Hodge and K. D. Glazebrook

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

The class of restless bandits as proposed by Whittle (1988) have long been known to be intractable. This paper presents an optimality result which extends that of Weber and Weiss (1990) for restless bandits to a more general setting in which individual bandits have multiple levels of activation but are subject to an overall resource constraint. The contribution is motivated by the recent works of Glazebrook et al. (2011a), (2011b) who discussed the performance of index heuristics for resource allocation in such systems. Hitherto, index heuristics have been shown, under a condition of full indexability, to be optimal for a natural Lagrangian relaxation of such problems in which a resource is purchased rather than constrained. We find that under key assumptions about the nature of solutions to a deterministic differential equation that the index heuristics above are asymptotically optimal in a sense described by Whittle. We then demonstrate that these assumptions always hold for three-state bandits.

Article information

Source
Adv. in Appl. Probab., Volume 47, Number 3 (2015), 652-667.

Dates
First available in Project Euclid: 8 October 2015

Permanent link to this document
https://projecteuclid.org/euclid.aap/1444308876

Digital Object Identifier
doi:10.1239/aap/1444308876

Mathematical Reviews number (MathSciNet)
MR3406602

Zentralblatt MATH identifier
1326.90102

Subjects
Primary: 90C40: Markov and semi-Markov decision processes
Secondary: 49L20: Dynamic programming method 49M20: Methods of relaxation type 93E20: Optimal stochastic control

Keywords
Index heuristic asymptotic optimality multi-action restless bandit stochastic resource allocation

Citation

Hodge, D. J.; Glazebrook, K. D. On the asymptotic optimality of greedy index heuristics for multi-action restless bandits. Adv. in Appl. Probab. 47 (2015), no. 3, 652--667. doi:10.1239/aap/1444308876. https://projecteuclid.org/euclid.aap/1444308876


Export citation

References

  • Archibald, T. W., Black, D. P. and Glazebrook, K. D. (2009). Indexability and index heuristics for a simple class of inventory routing problems. Operat. Res. 57, 314–326.
  • Ayesta, U., Jacko, P. and Novak, V. (2011). A nearly-optimal index rule for scheduling of users with abandonment. In Proc. IEEE INFOCOM, IEEE, New York, pp. 2849–2857.
  • Caro, F. and Gallien, J. (2007). Dynamic assortment with demand learning for seasonal consumer goods. Manag. Sci. 53, 276–292.
  • Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. J. R. Statist. Soc. B 41, 148–177.
  • Gittins, J. C., Glazebrook, K. D. and Weber, R. R. (2011). Multi-Armed Bandit Allocation Indices. John Wiley, Oxford.
  • Glazebrook, K. D., Hodge, D. J. and Kirkbride, C. (2011a). General notions of indexability for queueing control and asset management. Ann. Appl. Prob. 21, 876–907.
  • Glazebrook, K. D., Kirkbride, C. and Ouenniche, J. (2009). Index policies for the admission control and routing of impatient customers to heterogeneous service stations. Operat. Res. 57, 975–989.
  • Glazebrook, K. D., Mitchell, H. M. and Ansell, P. S. (2005). Index policies for the maintenance of a collection of machines by a set of repairmen. Europ. J. Operat. Res. 165, 267–284.
  • Glazebrook, K. D., Niño-Mora, J. and Ansell, P. S. (2002). Index policies for a class of discounted restless bandits. Adv. Appl. Prob. 34, 754–774.
  • Glazebrook, K. D., Ansell, P. S., Dunn, R. T. and Lumley, R. R. (2004). On the optimal allocation of service to impatient tasks. J. Appl. Prob. 41, 51–72.
  • Hodge, D. J. and Glazebrook, K. D. (2011b). Dynamic resource allocation in a multi-product make-to-stock production system. Queueing Systems 67, 333–364.
  • Mitra, D. and Weiss, A. (1988). A Transient Analysis of a Data Network with a Processor-Sharing Switch. AT&T Tech. J. 67, 4–16.
  • Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198.
  • Opp, M., Glazebrook, K. and Kulkarni, V. G. (2005). Outsourcing warranty repairs: dynamic allocation. Naval Res. Logistics 52, 381–398.
  • Papadimitriou, C. H. and Tsitsiklis, J. N. (1999). The complexity of optimal queuing network control. Math. Operat. Res. 24, 293–305.
  • Puterman, M. L. (2005). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.
  • Veatch, M. H. and Wein, L. M. (1996). Scheduling a make-to-stock queue: index policies and hedging points. Operat. Res. 44, 634–647.
  • Weber, R. (2007). Comments on: `Dynamic priority allocation via restless bandit marginal productivity indices'. TOP 15, 211–216.
  • Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27, 637–648.
  • Weber, R. R. and Weiss, G. (1991). Addendum to: `On an index policy for restless bandits'. Adv. Appl. Prob. 23, 429–430.
  • Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. Spec. Vol. 25), Applied Probability Trust, Sheffield, pp. 287–298.