June 2015 Sample-path optimal stationary policies in stable Markov decision chains with the average reward criterion
Rolando Cavazos-Cadena, Raúl Montes-de-Oca, Karel Sladký
Author Affiliations +
J. Appl. Probab. 52(2): 419-440 (June 2015). DOI: 10.1239/jap/1437658607

Abstract

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ 2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Citation

Download Citation

Rolando Cavazos-Cadena. Raúl Montes-de-Oca. Karel Sladký. "Sample-path optimal stationary policies in stable Markov decision chains with the average reward criterion." J. Appl. Probab. 52 (2) 419 - 440, June 2015. https://doi.org/10.1239/jap/1437658607

Information

Published: June 2015
First available in Project Euclid: 23 July 2015

zbMATH: 1327.90366
MathSciNet: MR3372084
Digital Object Identifier: 10.1239/jap/1437658607

Subjects:
Primary: 90C40
Secondary: 60J05 , 93E20

Keywords: discrepancy function , Dominated convergence theorem for the expected average criterion , innovations , Kolmogorov inequality , strong sample-path optimality

Rights: Copyright © 2015 Applied Probability Trust

JOURNAL ARTICLE
22 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.52 • No. 2 • June 2015
Back to Top