September 2012 The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach
Fran\c cois Dufour, M. Horiguchi, A. B. Piunovskiy
Author Affiliations +
Adv. in Appl. Probab. 44(3): 774-793 (September 2012). DOI: 10.1239/aap/1346955264

Abstract

This paper deals with discrete-time Markov decision processes (MDPs) under constraints where all the objectives have the same form of expected total cost over the infinite time horizon. The existence of an optimal control policy is discussed by using the convex analytic approach. We work under the assumptions that the state and action spaces are general Borel spaces, and that the model is nonnegative, semicontinuous, and there exists an admissible solution with finite cost for the associated linear program. It is worth noting that, in contrast to the classical results in the literature, our hypotheses do not require the MDP to be transient or absorbing. Our first result ensures the existence of an optimal solution to the linear program given by an occupation measure of the process generated by a randomized stationary policy. Moreover, it is shown that this randomized stationary policy provides an optimal solution to this Markov control problem. As a consequence, these results imply that the set of randomized stationary policies is a sufficient set for this optimal control problem. Finally, our last main result states that all optimal solutions of the linear program coincide on a special set with an optimal occupation measure generated by a randomized stationary policy. Several examples are presented to illustrate some theoretical issues and the possible applications of the results developed in the paper.

Citation

Download Citation

Fran\c cois Dufour. M. Horiguchi. A. B. Piunovskiy. "The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach." Adv. in Appl. Probab. 44 (3) 774 - 793, September 2012. https://doi.org/10.1239/aap/1346955264

Information

Published: September 2012
First available in Project Euclid: 6 September 2012

zbMATH: 1286.90161
MathSciNet: MR3024609
Digital Object Identifier: 10.1239/aap/1346955264

Subjects:
Primary: 90C40
Secondary: 60J10 , 90C90

Keywords: constraint , expected total cost criterion , linear programming , Markov decision process , occupation measure

Rights: Copyright © 2012 Applied Probability Trust

JOURNAL ARTICLE
20 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.44 • No. 3 • September 2012
Back to Top