Abstract
Nash obtained an important extension to the classical theory of Gittins indexation when he demonstrated that index policies were optimal for a class of multiarmed bandit problems with a multiplicatively separable reward structure. We characterise the relevant indices (herein referred to as Nash indices) as equivalent retirement rewards/penalties for appropriately defined maximisation/minimisation problems. We also give a condition which is sufficient to guarantee the optimality of index policies for a Nash-type model in which each constituent bandit has its own decision structure.
Citation
K. D. Glazebrook. "Indices for Families of Competing Markov Decision Processes with Influence." Ann. Appl. Probab. 3 (4) 1013 - 1032, November, 1993. https://doi.org/10.1214/aoap/1177005270
Information