Open Access
January, 1974 On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains
Mark R. Lembersky
Ann. Statist. 2(1): 159-169 (January, 1974). DOI: 10.1214/aos/1176342621

Abstract

For continuous time Markov decision chains of finite duration, we show that the vector of maximal total rewards, less a linear average-return term, converges as the duration $t \rightarrow \infty$. We then show that there are policies which are both simultaneously $\varepsilon$-optimal for all durations $t$ and are stationary except possibly for a final, finite segment. Further, the length of this final segment depends on $\varepsilon$, but not on $t$ for large enough $t$, while the initial stationary part of the policy is independent of both $\varepsilon$ and $t$.

Citation

Download Citation

Mark R. Lembersky. "On Maximal Rewards and $|varepsilon$-Optimal Policies in Continuous Time Markov Decision Chains." Ann. Statist. 2 (1) 159 - 169, January, 1974. https://doi.org/10.1214/aos/1176342621

Information

Published: January, 1974
First available in Project Euclid: 12 April 2007

zbMATH: 0272.90083
MathSciNet: MR349239
Digital Object Identifier: 10.1214/aos/1176342621

Subjects:
Primary: 90C40
Secondary: 90B99 , 93E20

Keywords: $\varepsilon$-optimal policies , dynamic programming , initially stationary policies , Markov decision chains , maximal rewards

Rights: Copyright © 1974 Institute of Mathematical Statistics

Vol.2 • No. 1 • January, 1974
Back to Top