The Annals of Mathematical Statistics

Maximal Average-Reward Policies for Semi-Markov Decision Processes With Arbitrary State and Action Space

Steven A. Lippman

Full-text: Open access

Abstract

We consider the problem of maximizing the long-run average (also the long-run average expected) reward per unit time in a semi-Markov decision processes with arbitrary state and action space. Our main result states that we need only consider the set of stationary policies in that for each $\varepsilon > 0$ there is a stationary policy which is $\varepsilon$-optimal. This result is derived under the assumptions that (roughly) (i) expected rewards and expected transition times are uniformly bounded over all states and actions, and that (ii) there is a state such that the expected length of time until the system returns to this state is uniformly bounded over all policies. The existence of an optimal stationary policy is established under the additional assumption of countable state and finite action space. Applications to queueing reward systems are given.

Article information

Source
Ann. Math. Statist., Volume 42, Number 5 (1971), 1717-1726.

Dates
First available in Project Euclid: 27 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aoms/1177693170

Digital Object Identifier
doi:10.1214/aoms/1177693170

Mathematical Reviews number (MathSciNet)
MR368793

Zentralblatt MATH identifier
0231.90057

JSTOR
links.jstor.org

Citation

Lippman, Steven A. Maximal Average-Reward Policies for Semi-Markov Decision Processes With Arbitrary State and Action Space. Ann. Math. Statist. 42 (1971), no. 5, 1717--1726. doi:10.1214/aoms/1177693170. https://projecteuclid.org/euclid.aoms/1177693170


Export citation