The Annals of Applied Probability

Indices for Families of Competing Markov Decision Processes with Influence

K. D. Glazebrook

Full-text: Open access

Abstract

Nash obtained an important extension to the classical theory of Gittins indexation when he demonstrated that index policies were optimal for a class of multiarmed bandit problems with a multiplicatively separable reward structure. We characterise the relevant indices (herein referred to as Nash indices) as equivalent retirement rewards/penalties for appropriately defined maximisation/minimisation problems. We also give a condition which is sufficient to guarantee the optimality of index policies for a Nash-type model in which each constituent bandit has its own decision structure.

Article information

Source
Ann. Appl. Probab., Volume 3, Number 4 (1993), 1013-1032.

Dates
First available in Project Euclid: 19 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aoap/1177005270

Digital Object Identifier
doi:10.1214/aoap/1177005270

Mathematical Reviews number (MathSciNet)
MR1241032

Zentralblatt MATH identifier
0795.90084

JSTOR
links.jstor.org

Subjects
Primary: 90C40: Markov and semi-Markov decision processes

Keywords
Gittins index Markov decision process optimal policy stopping time

Citation

Glazebrook, K. D. Indices for Families of Competing Markov Decision Processes with Influence. Ann. Appl. Probab. 3 (1993), no. 4, 1013--1032. doi:10.1214/aoap/1177005270. https://projecteuclid.org/euclid.aoap/1177005270


Export citation