Sample-path optimal stationary policies in stable Markov decision chains with the average reward criterion

Rolando Cavazos-Cadena; Raúl Montes-de-Oca; Karel Sladký

doi:10.1239/jap/1437658607

Abstract

This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ ² is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.

Citation

Download Citation

Rolando Cavazos-Cadena. Raúl Montes-de-Oca. Karel Sladký. "Sample-path optimal stationary policies in stable Markov decision chains with the average reward criterion." J. Appl. Probab. 52 (2) 419 - 440, June 2015. https://doi.org/10.1239/jap/1437658607

Information

Published: June 2015

First available in Project Euclid: 23 July 2015

zbMATH: 1327.90366

MathSciNet: MR3372084

Digital Object Identifier: 10.1239/jap/1437658607

Subjects:

Primary: 90C40

Secondary: 60J05 , 93E20

Keywords: discrepancy function , Dominated convergence theorem for the expected average criterion , innovations , Kolmogorov inequality , strong sample-path optimality

Abstract

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS