Open Access
November, 1993 Indices for Families of Competing Markov Decision Processes with Influence
K. D. Glazebrook
Ann. Appl. Probab. 3(4): 1013-1032 (November, 1993). DOI: 10.1214/aoap/1177005270

Abstract

Nash obtained an important extension to the classical theory of Gittins indexation when he demonstrated that index policies were optimal for a class of multiarmed bandit problems with a multiplicatively separable reward structure. We characterise the relevant indices (herein referred to as Nash indices) as equivalent retirement rewards/penalties for appropriately defined maximisation/minimisation problems. We also give a condition which is sufficient to guarantee the optimality of index policies for a Nash-type model in which each constituent bandit has its own decision structure.

Citation

Download Citation

K. D. Glazebrook. "Indices for Families of Competing Markov Decision Processes with Influence." Ann. Appl. Probab. 3 (4) 1013 - 1032, November, 1993. https://doi.org/10.1214/aoap/1177005270

Information

Published: November, 1993
First available in Project Euclid: 19 April 2007

zbMATH: 0795.90084
MathSciNet: MR1241032
Digital Object Identifier: 10.1214/aoap/1177005270

Subjects:
Primary: 90C40

Keywords: Gittins index , Markov decision process , optimal policy , stopping time

Rights: Copyright © 1993 Institute of Mathematical Statistics

Vol.3 • No. 4 • November, 1993
Back to Top