Advances in Applied Probability

Absorbing continuous-time Markov decision processes with total cost criteria

Xianping Guo, Mantas Vykertas, and Yi Zhang

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

In this paper we study absorbing continuous-time Markov decision processes in Polish state spaces with unbounded transition and cost rates, and history-dependent policies. The performance measure is the expected total undiscounted costs. For the unconstrained problem, we show the existence of a deterministic stationary optimal policy, whereas, for the constrained problems with N constraints, we show the existence of a mixed stationary optimal policy, where the mixture is over no more than N+1 deterministic stationary policies. Furthermore, the strong duality result is obtained for the associated linear programs.

Article information

Source
Adv. in Appl. Probab., Volume 45, Number 2 (2013), 490-519.

Dates
First available in Project Euclid: 10 June 2013

Permanent link to this document
https://projecteuclid.org/euclid.aap/1370870127

Digital Object Identifier
doi:10.1239/aap/1370870127

Mathematical Reviews number (MathSciNet)
MR3102460

Zentralblatt MATH identifier
1282.90229

Subjects
Primary: 90C40: Markov and semi-Markov decision processes
Secondary: 60J25: Continuous-time Markov processes on general state spaces 60J75: Jump processes

Keywords
CTMDP total cost constrained optimality linear program

Citation

Guo, Xianping; Vykertas, Mantas; Zhang, Yi. Absorbing continuous-time Markov decision processes with total cost criteria. Adv. in Appl. Probab. 45 (2013), no. 2, 490--519. doi:10.1239/aap/1370870127. https://projecteuclid.org/euclid.aap/1370870127


Export citation

References

  • Aliprantis, C. and Border, K. (2007). Infinite Dimensional Analysis. Springer, New York.
  • Altman, E. (1999). Constrained Markov Decision Processes. Chapman and Hall/CRC, Boca Raton.
  • Bertsekas, D. P. and Shreve, S. E. (1978). Stochastic Optimal Control. Academic Press, New York.
  • Bertsekas, D., Nedíc, A. and Ozdaglar, A. (2003). Convex Analysis and Optimization. Athena Scientific, Belmont, MA.
  • Bogachev, V. I. (2007). Measure Theory, Vol. I. Springer, Berlin.
  • Bogachev, V. I. (2007). Measure Theory, Vol. II. Springer, Berlin.
  • Clancy, D. and Piunovskiy, A. B. (2005). An explicit optimal isolation policy for a determinisitc epidemic model. Appl. Math. Comput. 163, 1109–1121.
  • Dubins, L. E. (1962). On extreme points of convex sets. J. Math. Anal. Appl. 5, 237–244.
  • Feinberg, E. A. and Fei, J. (2009). An inequality for variances of the discounted rewards. J. Appl. Prob. 46, 1209–1212.
  • Feinberg, E. A. and Rothblum, U. G. (2012). Splitting randomized stationary policies in total-reward Markov decision processes. Math. Operat. Res. 37, 129–153.
  • Gleissner, W. (1988). The spread of epidemics. Appl. Math. Comput. 27, 167–171.
  • Guo, X. (2007). Constrained optimization for average cost continuous-time Markov decision processes. IEEE Trans. Automatic Control 52, 1139–1143.
  • Guo, X. and Hernández-Lerma, O. (2009). Continuous-time Markov Decision Processes. Springer, Berlin.
  • Guo, X. and Rieder, U. (2006). Average optimality for continuous-time Markov decision processes in Polish spaces. Ann. Appl. Prob. 16, 730–756.
  • Guo, X. and Song, X. (2011). Discounted continuous-time constrained Markov decision processes in Polish spaces. Ann. Appl. Prob. 21, 2016–2049.
  • Guo, X. and Zhang, L. (2011). Total reward criteria for unconstrained/constrained continuous-time Markov decision processes. J. Systems Sci. Complex. 24, 491–505.
  • Guo, X., Huang, Y. and Song, X. (2012). Linear programming and constrained average optimality for general continuous-time Markov decision processes in history-dependent policies. SIAM J. Control Optimization 50, 23–47.
  • Hernández-Lerma, O. and Lasserre, J. B. (1996). Discrete-time Markov Control Processes. Springer, New York.
  • Hernández-Lerma, O. and Lasserre, J. B. (1999). Further Topics on Discrete-Time Markov Control Processes. Springer, New York.
  • Hernández-Lerma, O. and Lasserre, J. B. (2000). Fatou's lemma and Lebesgue's convergence theorem for measures. J. Appl. Math. Stoch. Anal. 13, 137–146.
  • Himmelberg, C. J. (1975). Measurable relations. Fund. Math. 87, 53–72.
  • Himmelberg, C. J., Parthasarathy, T. and Van Vleck, F. S. (1976). Optimal plans for dynamic programming problems. Math. Operat. Res. 1, 390–394.
  • Jacod, J. (1975). Multivariate point processes: predictable projection, Radon-Nykodym derivatives, representation of martingales. Z. Wahrscheinlichkeitsth. 31, 235–253.
  • Kitaev, M. (1986). Semi-Markov and jump Markov controlled models: average cost criterion. Theory. Prob. Appl. 30, 272–288.
  • Kitaev, M. and Rykov, V. V. (1995). Controlled Queueing Systems. CRC Press, Boca Raton, FL.
  • Piunovskiy, A. B. (1997). Optimal Control of Random Sequences in Problems with Constraints. Kluwer, Dordrecht.
  • Piunovskiy, A. B. (1998). A controlled jump discounted model with constraints. Theory Prob. Appl. 42, 51–71.
  • Piunovskiy, A. B. (2004). Optimal interventions in countable jump Markov processes. Math. Operat. Res. 29, 289–308.
  • Piunovskiy, A. and Zhang, Y. (2011). Accuracy of fluid approximation to controlled birth-and-death processes: absorbing case. Math. Meth. Operat. Res. 73, 159–187.
  • Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the dynamic programming approach. Preprint. Available at http://arxiv.org/abs/1103.0134v1.
  • Piunovskiy, A. and Zhang, Y. (2011). Discounted continuous-time Markov decision processes with unbounded rates: the convex analytic approach. SIAM J. Control Optimization 49, 2032–2061.
  • Piunovskiy, A. and Zhang, Y. (2012). The transformation method for continuous-time Markov decision processes. J. Optimization Theory Appl. 154, 691–712.
  • Pliska, S. R. (1975). Controlled jump processes. Stoch. Process Appl. 3, 259–282.
  • Prieto-Rumeau, T. and Hernández-Lerma, O. (2008). Ergodic control of continuous-time Markov chains with pathwise constraints. SIAM J. Control Optimization 47, 1888–1908.
  • Rockafellar, R. T. (1974). Conjugate Duality and Optimization. SIAM, Philadelphia, PA.
  • Varadarajan, V. S. (1958). Weak convergence of measures on separable metric spaces. Sankhyā 19, 15–22.
  • Yeh, J. (2006). Real analysis: Theory of Measure and Integration, 2nd edn. World Scientific, Hackensack, NJ.
  • Zhang, Y. (2011). Convex analytic approach to constrained discounted Markov decision processes with non-constant discount factor. TOP, 31pp.
  • Zhu, Q. (2008). Average optimality for continuous-time jump Markov decision processes with a policy iteration approach. J. Math. Anal. Appl. 339, 691–704.
  • Zhu, Q. and Prieto-Rumeau, T. (2008). Bias and overtaking optimality for continuous-time jump Markov decision processes in Polish spaces. J. Appl. Prob. 45, 417–429.