The Annals of Applied Probability
- Ann. Appl. Probab.
- Volume 3, Number 4 (1993), 1013-1032.
Indices for Families of Competing Markov Decision Processes with Influence
Nash obtained an important extension to the classical theory of Gittins indexation when he demonstrated that index policies were optimal for a class of multiarmed bandit problems with a multiplicatively separable reward structure. We characterise the relevant indices (herein referred to as Nash indices) as equivalent retirement rewards/penalties for appropriately defined maximisation/minimisation problems. We also give a condition which is sufficient to guarantee the optimality of index policies for a Nash-type model in which each constituent bandit has its own decision structure.
Ann. Appl. Probab., Volume 3, Number 4 (1993), 1013-1032.
First available in Project Euclid: 19 April 2007
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Primary: 90C40: Markov and semi-Markov decision processes
Glazebrook, K. D. Indices for Families of Competing Markov Decision Processes with Influence. Ann. Appl. Probab. 3 (1993), no. 4, 1013--1032. doi:10.1214/aoap/1177005270. https://projecteuclid.org/euclid.aoap/1177005270