Abstract
This paper concerns discrete-time Markov decision chains with denumerable state and compact action sets. Besides standard continuity requirements, the main assumption on the model is that it admits a Lyapunov function ℓ. In this context the average reward criterion is analyzed from the sample-path point of view. The main conclusion is that if the expected average reward associated to ℓ 2 is finite under any policy then a stationary policy obtained from the optimality equation in the standard way is sample-path average optimal in a strong sense.
Citation
Rolando Cavazos-Cadena. Raúl Montes-de-Oca. Karel Sladký. "Sample-path optimal stationary policies in stable Markov decision chains with the average reward criterion." J. Appl. Probab. 52 (2) 419 - 440, June 2015. https://doi.org/10.1239/jap/1437658607
Information