Advances in Applied Probability

Index policies for discounted bandit problems with availability constraints

Savas Dayanik, Warren Powell, and Kazutoshi Yamazaki

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

A multiarmed bandit problem is studied when the arms are not always available. The arms are first assumed to be intermittently available with some state/action-dependent probabilities. It is proven that no index policy can attain the maximum expected total discounted reward in every instance of that problem. The Whittle index policy is derived, and its properties are studied. Then it is assumed that the arms may break down, but repair is an option at some cost, and the new Whittle index policy is derived. Both problems are indexable. The proposed index policies cannot be dominated by any other index policy over all multiarmed bandit problems considered here. Whittle indices are evaluated for Bernoulli arms with unknown success probabilities.

Article information

Source
Adv. in Appl. Probab. Volume 40, Number 2 (2008), 377-400.

Dates
First available: 1 July 2008

Permanent link to this document
http://projecteuclid.org/euclid.aap/1214950209

Digital Object Identifier
doi:10.1239/aap/1214950209

Mathematical Reviews number (MathSciNet)
MR2431302

Zentralblatt MATH identifier
1140.93047

Subjects
Primary: 93E20: Optimal stochastic control
Secondary: 90B36: Scheduling theory, stochastic [See also 68M20]

Keywords
Optimal resource allocation multiarmed bandit problem Gittins index Whittle index restart-in problem

Citation

Dayanik, Savas; Powell, Warren; Yamazaki, Kazutoshi. Index policies for discounted bandit problems with availability constraints. Advances in Applied Probability 40 (2008), no. 2, 377--400. doi:10.1239/aap/1214950209. http://projecteuclid.org/euclid.aap/1214950209.


Export citation

References

  • Banks, J. S. and Sundaram, R. K. (1992). Denumerable-armed bandits. Econometrica 60, 1071--1096.
  • Banks, J. S. and Sundaram, R. K. (1994). Switching costs and the Gittins index. Econometrica 62, 687--694.
  • Bergemann, D. and Valimaki, J. (2006). Efficient dynamic auctions. Cowles Foundation Discussion Paper 1584, Yale University.
  • Brezzi, M. and Lai, T. L. (2000). Incomplete learning endogenous data in dynamic allocation. Econometrica 68, 1511--1516.
  • Dayanik, S., Powell, W. and Yamazaki, K. (2007). Index policies for discounted bandit problems with availability constraints. Tech. Rep., Department of Operations Research and Financial Engineering, Princeton University.
  • Gittins, J. C. (1979). Bandit processes and dynamic allocation indices. J. R. Statist. Soc. B 41, 148--177.
  • Glazebrook, K. D. (1984). Scheduling stochastic jobs on a single machine subject to breakdowns. Naval Res. Logistics Quart. 31, 251--264.
  • Glazebrook, K. D. (1987). Evaluating the effects of machine breakdowns in stochastic scheduling problems. Naval Res. Logistics 34, 319--335.
  • Glazebrook, K. D. and Mitchell, H. M. (2002). An index policy for a stochastic scheduling model with improving/deteriorating jobs. Naval Res. Logistics 49, 706--721.
  • Glazebrook, K. D., Niño-Mora, J. and Ansell, P. S. (2002). Index policies for a class of discounted restless bandits. Adv. Appl. Prob. 34, 754--774.
  • Glazebrook, K. D., Ruiz-Hernandez, D. and Kirkbride, C. (2006). Some indexable families of restless bandit problems. Adv. Appl. Prob. 38, 643--672.
  • Glazebrook, K. D., Ansell, P. S., Dunn, R. T. and Lumley, R. R. (2004). On the optimal allocation of service to impatient tasks. J. Appl. Prob. 41, 51--72.
  • Jovanovic, B. (1979). Job matching and the theory of turnover. J. Political Econom. 87, 972--990.
  • Jun, T. (2004). A survey on the bandit problem with switching costs. De Economist 1524, 513--541.
  • Katehakis, M. N. and Derman, C. (1986). Computing optimal sequential allocation rules in clinical trials. In Adaptive Statistical Procedures and Related Topics (Upton, NY, 1985; IMS Lecture Notes Monogr. Ser. 8), Institute of Mathematical Statistics, Hayward, CA, pp. 29--39.
  • Katehakis, M. N. and Veinott, A. F., Jr. (1987). The multi-armed bandit problem: decomposition and computation. Math. Operat. Res. 12, 262--268.
  • Miller, R. A. (1984). Job matching and occupational choice. J. Political Econom. 926, 1086--1120.
  • Niño-Mora, J. (2001). Restless bandits, partial conservation laws and indexability. Adv. Appl. Prob. 33, 76--98.
  • Papadimitriou, C. H. and Tsitsiklis, J. N. (1999). The complexity of optimal queuing network control. Math. Operat. Res. 24, 293--305.
  • Ross, S. (1983). Introduction to Stochastic Dynamic Programming. Academic Press, New York.
  • Rothschild, M. (1974). A two-armed bandit theory of market pricing. J. Econom. Theory 9, 185--202.
  • Tsitsiklis, J. N. (1994). A short proof of the Gittins index theorem. Ann. Appl. Prob. 4, 194--199.
  • Weber, R. R. and Weiss, G. (1990). On an index policy for restless bandits. J. Appl. Prob. 27, 637--648.
  • Weber, R. R. and Weiss, G. (1991). Addendum to: `On an index policy for restless bandits'. Adv. Appl. Prob. 23, 429--430.
  • Whittle, P. (1980). Multi-armed bandits and the Gittins index. J. R. Statist. Soc. B 42, 143--149.
  • Whittle, P. (1988). Restless bandits: activity allocation in a changing world. In A Celebration of Applied Probability (J. Appl. Prob. Spec. Vol. 25A), Applied Probability Trust, Sheffield, pp. 287--298.