Journal of Applied Probability

Open bandit processes with uncountable states and time-backward effects

Xianyi Wu and Xian Zhou

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Bandit processes and the Gittins index have provided powerful and elegant theory and tools for the optimization of allocating limited resources to competitive demands. In this paper we extend the Gittins theory to more general branching bandit processes, also referred to as open bandit processes, that allow uncountable states and backward times. We establish the optimality of the Gittins index policy with uncountably many states, which is useful in such problems as dynamic scheduling with continuous random processing times. We also allow negative time durations for discounting a reward to account for the present value of the reward that was received before the present time, which we refer to as time-backward effects. This could model the situation of offering bonus rewards for completing jobs above expectation. Moreover, we discover that a common belief on the optimality of the Gittins index in the generalized bandit problem is not always true without additional conditions, and provide a counterexample. We further apply our theory of open bandit processes with time-backward effects to prove the optimality of the Gittins index in the generalized bandit problem under a sufficient condition.

Article information

J. Appl. Probab., Volume 50, Number 2 (2013), 388-402.

First available in Project Euclid: 19 June 2013

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 90B36: Scheduling theory, stochastic [See also 68M20] 60G40: Stopping times; optimal stopping problems; gambling theory [See also 62L15, 91A60] 90C40: Markov and semi-Markov decision processes

Open bandit process generalized bandit process Gittins index priority scheduling


Wu, Xianyi; Zhou, Xian. Open bandit processes with uncountable states and time-backward effects. J. Appl. Probab. 50 (2013), no. 2, 388--402. doi:10.1239/jap/1371648948.

Export citation


  • Bertsimas D. and Niño-Mora, J. (1996). Conservation laws, extended polymatroids and multiarmed bandit problems; a polyhedral approach to indexable systems. Math. Operat. Res. 21, 257–306.
  • Crosbie, J. H. and Glazebrook, K. D. (2000). Index policies and a novel performance space structure for a class of generalised branching bandit problems. Math. Operat. Res. 25, 281–297.
  • Denardo, E. V., Park, H. and Rothblum, U. G. (2007). Risk-sensitive and risk-neutral multiarmed bandits. Math. Operat. Res. 32, 374–394.
  • Gittins, J. and Jones, D. (1974). A dynamic allocation index for the sequential design of experiments. In Progress in Statistics, ed. J. Gani, North-Holland, Amsterdam, pp. 241–266.
  • Gittins, J., Glazebrook, K. and Weber, R. (2011). Multi-Armed Bandit Allocation Indices. John Wiley, Chichester.
  • Glazebrook, K. D. and Owen, R. W. (1991). New results for generalised bandit processes. Internat. J. Systems Sci. 22, 479–494.
  • Ishikida, T. and Varaiya, P. (1994). Multi-armed bandit problem revisited. J. Optimization Theory Appl. 83, 113–154.
  • Lai, T. L. and Ying, Z. (1988). Open bandit processes and optimal scheduling of queueing networks. Adv. Appl. Prob. 20, 447–472.
  • Nash, P. (1973). Optimal allocation of resources between research projects. Doctoral Thesis, Cambridge University.
  • Nash, P. (1980). A generalized bandit problem. J. R. Statist. Soc. B 42, 165–169.
  • Sonin, I. M. (2008). A generalized Gittins index for a Markov chain and its recursive calculation. Statist. Prob. Lett. 78, 1526–1533.
  • Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. Ann. Appl. Prob. 4, 194–199.
  • Varaiya, P. P., Walrand, J. C. and Buyukkoc, C. (1985). Extensions of the multiarmed bandit problem: the discounted case. IEEE Trans. Automatic Control 30, 426–439.
  • Weber, R. (1992). On the Gittins index for multiarmed bandits. Ann. Appl. Prob. 2, 1024–1033.
  • Weiss, G. (1988). Branching bandit processes. Prob. Eng. Inf. Sci. 2, 269–278.
  • Whittle, P. (1981). Arm-acquiring bandits. Ann. Prob. 9, 284–292. \endharvreferences