Journal of Applied Probability
- J. Appl. Probab.
- Volume 50, Number 2 (2013), 388-402.
Open bandit processes with uncountable states and time-backward effects
Bandit processes and the Gittins index have provided powerful and elegant theory and tools for the optimization of allocating limited resources to competitive demands. In this paper we extend the Gittins theory to more general branching bandit processes, also referred to as open bandit processes, that allow uncountable states and backward times. We establish the optimality of the Gittins index policy with uncountably many states, which is useful in such problems as dynamic scheduling with continuous random processing times. We also allow negative time durations for discounting a reward to account for the present value of the reward that was received before the present time, which we refer to as time-backward effects. This could model the situation of offering bonus rewards for completing jobs above expectation. Moreover, we discover that a common belief on the optimality of the Gittins index in the generalized bandit problem is not always true without additional conditions, and provide a counterexample. We further apply our theory of open bandit processes with time-backward effects to prove the optimality of the Gittins index in the generalized bandit problem under a sufficient condition.
J. Appl. Probab., Volume 50, Number 2 (2013), 388-402.
First available in Project Euclid: 19 June 2013
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Primary: 90B36: Scheduling theory, stochastic [See also 68M20] 60G40: Stopping times; optimal stopping problems; gambling theory [See also 62L15, 91A60] 90C40: Markov and semi-Markov decision processes
Wu, Xianyi; Zhou, Xian. Open bandit processes with uncountable states and time-backward effects. J. Appl. Probab. 50 (2013), no. 2, 388--402. doi:10.1239/jap/1371648948. https://projecteuclid.org/euclid.jap/1371648948