Advances in Applied Probability

Monotone policies and indexability for bidirectional restless bandits

K. D. Glazebrook, D. J. Hodge, and C. Kirkbride

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Motivated by a wide range of applications, we consider a development of Whittle's restless bandit model in which project activation requires a state-dependent amount of a key resource, which is assumed to be available at a constant rate. As many projects may be activated at each decision epoch as resource availability allows. We seek a policy for project activation within resource constraints which minimises an aggregate cost rate for the system. Project indices derived from a Lagrangian relaxation of the original problem exist provided the structural requirement of indexability is met. Verification of this property and derivation of the related indices is greatly simplified when the solution of the Lagrangian relaxation has a state monotone structure for each constituent project. We demonstrate that this is indeed the case for a wide range of bidirectional projects in which the project state tends to move in a different direction when it is activated from that in which it moves when passive. This is natural in many application domains in which activation of a project ameliorates its condition, which otherwise tends to deteriorate or deplete. In some cases the state monotonicity required is related to the structure of state transitions, while in others it is also related to the nature of costs. Two numerical studies demonstrate the value of the ideas for the construction of policies for dynamic resource allocation, most especially in contexts which involve a large number of projects.

Article information

Adv. in Appl. Probab., Volume 45, Number 1 (2013), 51-85.

First available in Project Euclid: 15 March 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 90C40: Markov and semi-Markov decision processes
Secondary: 49L20: Dynamic programming method 90C39: Dynamic programming [See also 49L20] 49M20: Methods of relaxation type

asset management Gittins index indexability inventory management Lagrangian relaxation machine maintenance monotone policy stochastic dynamic programming restless bandit Whittle index


Glazebrook, K. D.; Hodge, D. J.; Kirkbride, C. Monotone policies and indexability for bidirectional restless bandits. Adv. in Appl. Probab. 45 (2013), no. 1, 51--85. doi:10.1239/aap/1363354103.

Export citation


  • Ansell, P., Glazebrook, K. D., Niño-Mora, J. and O'Keeffe, M. (2003). Whittle's index policy for a multi-class queueing system with convex holding costs. Math. Meth. Operat. Res. 57, 21–39.
  • Archibald, T. W., Black, D. P. and Glazebrook, K. D. (2009). Indexability and index heuristics for a simple class of inventory routing problems. Operat. Res. 57, 314–326.
  • Dacre, M., Glazebrook, K. and Niño-Mora, J. (1999). The achievable region approach to the optimal control of stochastic systems (with discussion). J. R. Statist. Soc. B 61, 747–791.
  • Gittins, J. C. (1979). Bandit processes and dynamic allocation indices (with discussion). J. R. Statist. Soc. B 41, 148–177.
  • Gittins, J. C. (1989). Multi-Armed Bandit Allocation Indices. John Wiley, Chichester.
  • Glazebrook, K. D. and Minty, R. (2009). A generalized Gittins index for a class of multiarmed bandits with general resource requirements. Math. Operat. Res. 34, 26–44.
  • Glazebrook, K. D., Kirkbride, C. and Ruiz-Hernandez, D. (2006). Spinning plates and squad systems: policies for bidirectional restless bandits. Adv. Appl. Prob. 38, 95–115.
  • Glazebrook, K. D., Niño-Mora, J. and Ansell, P. S. (2002). Index policies for a class of discounted restless bandits. Adv. Appl. Prob. 34, 754–774.
  • Jacko, P. (2009). Marginal productivity index policies for dynamic priority allocation in restless bandit models. Doctoral thesis, Universidad Carlos III de Madrid.
  • Le Ny, J., Dahleh, M. and Feron, E. (2008). Multi-UAV dynamic routing with partial observations using restless bandit allocation indices. In 2008 American Control Conference, pp. 4220–4225.
  • Liu, K. and Zhao, Q. (2010). Indexability of restless bandit problems and optimality of Whittle index for dynamic multichannel access. IEEE Trans. Inf. Theory 56, 5547–5567.
  • Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Adv. Appl. Prob. 33, 76–98.
  • Niño-Mora, J. (2007). Dynamic priority allocation via restless bandit marginal productivity indices. TOP 15, 161–198.
  • Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.
  • Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27, 637–648.
  • Weber, R. R. and Weiss, G. (1991). Addendum to `On an index policy for restless bandits'. Adv. Appl. Prob. 23, 429–430.
  • Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. Spec. Vol. 25A), ed. J. Gani, Applied Probability Trust, Sheffield, pp. 287–298.
  • Whittle, P. (1996). Optimal Control. John Wiley, Chichester.