The Annals of Mathematical Statistics

Discrete Dynamic Programming with Unbounded Rewards

J. Michael Harrison

Full-text: Open access

Abstract

Countable state and action Markov decision processes are investigated, the objective being to maximize expected discounted reward. Well-known results of Maitra and Blackwell are generalized, their assumption of bounded rewards being replaced by weaker conditions, the most important of which is as follows. The expected reward to be received at time $n + 1$ minus the actual reward received at time $n$, viewed as a function of the state at time $n$, the action at time $n$ and the decision rule to be followed at time $n + 1$, can be bounded. It is shown that there exists an $\varepsilon$-optimal stationary policy for every $\varepsilon > 0$ and that there exists an optimal stationary policy in the finite action case.

Article information

Source
Ann. Math. Statist., Volume 43, Number 2 (1972), 636-644.

Dates
First available in Project Euclid: 27 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aoms/1177692643

Digital Object Identifier
doi:10.1214/aoms/1177692643

Mathematical Reviews number (MathSciNet)
MR354023

Zentralblatt MATH identifier
0262.90064

JSTOR
links.jstor.org

Citation

Harrison, J. Michael. Discrete Dynamic Programming with Unbounded Rewards. Ann. Math. Statist. 43 (1972), no. 2, 636--644. doi:10.1214/aoms/1177692643. https://projecteuclid.org/euclid.aoms/1177692643


Export citation