Advances in Applied Probability

The expected total cost criterion for Markov decision processes under constraints

François Dufour and A. B. Piunovskiy

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

In this work, we study discrete-time Markov decision processes (MDPs) with constraints when all the objectives have the same form of expected total cost over the infinite time horizon. Our objective is to analyze this problem by using the linear programming approach. Under some technical hypotheses, it is shown that if there exists an optimal solution for the associated linear program then there exists a randomized stationary policy which is optimal for the MDP, and that the optimal value of the linear program coincides with the optimal value of the constrained control problem. A second important result states that the set of randomized stationary policies provides a sufficient set for solving this MDP. It is important to note that, in contrast with the classical results of the literature, we do not assume the MDP to be transient or absorbing. More importantly, we do not impose the cost functions to be non-negative or to be bounded below. Several examples are presented to illustrate our results.

Article information

Source
Adv. in Appl. Probab., Volume 45, Number 3 (2013), 837-859.

Dates
First available in Project Euclid: 30 August 2013

Permanent link to this document
https://projecteuclid.org/euclid.aap/1377868541

Digital Object Identifier
doi:10.1239/aap/1377868541

Mathematical Reviews number (MathSciNet)
MR3102474

Zentralblatt MATH identifier
1298.90126

Subjects
Primary: 90C40: Markov and semi-Markov decision processes
Secondary: 60J10: Markov chains (discrete-time Markov processes on discrete state spaces) 90C90: Applications of mathematical programming

Keywords
Markov decision process expected total cost criterion constraints linear programming occupation measure

Citation

Dufour, François; Piunovskiy, A. B. The expected total cost criterion for Markov decision processes under constraints. Adv. in Appl. Probab. 45 (2013), no. 3, 837--859. doi:10.1239/aap/1377868541. https://projecteuclid.org/euclid.aap/1377868541


Export citation

References

  • Altman, E. (1999). Constrained Markov Decision Processes. Chapman & Hall/CRC, Boca Raton, FL.
  • Bäuerle, N. and Rieder, U. (2011). Markov Decision Processes with Applications to Finance. Springer, Heidelberg.
  • Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control (Math. Sci. Eng. 139). Academic Press, New York.
  • Borkar, V. S. (1991). Topics in Controlled Markov Chains (Pitman Res. Notes Math. Ser. 240). Longman Scientific & Technical, Harlow.
  • Borkar, V. S. (2002). Convex analytic methods in Markov decision processes. In Handbook of Markov Decision Processes (Internat. Ser. Operat. Res. Manag. Sci. 40), Kluwer, Boston, MA, pp. 347–375.
  • Dufour, F. and Piunovskiy, A. B. (2010). Multiobjective stopping problem for discrete-time Markov processes: convex analytic approach. J. Appl. Prob. 47, 947–966.
  • Dufour, F., Horiguchi, M. and Piunovskiy, A. B. (2012). The expected total cost criterion for Markov decision processes under constraints: a convex analytic approach. Adv. Appl. Prob. 44, 774–793.
  • Filar, J. and Vrieze, K. (1997). Competitive Markov Decision Processes. Springer, New York.
  • Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-Time Markov Control Processes (Appl. Math. 30). Springer, New York.
  • Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes (Appl. Math. 42). Springer, New York.
  • Horiguchi, M. (2001). Markov decision processes with a stopping time constraint. Math. Meth. Operat. Res. 53, 279–295.
  • Horiguchi, M. (2001). Stopped Markov decision processes with multiple constraints. Math. Meth. Operat. Res. 54, 455–469.
  • Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints (Math. Appl. 410). Kluwer Academic, Dordrecht.
  • Puterman, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dynamic Programming. John Wiley, New York.
  • Sennott, L. I. (1999). Stochastic Dynamic Programming and the Control of Queueing Systems. John Wiley, New York.