## Annals of Statistics

### Blackwell optimality in Markov decision processes with partial observation

#### Abstract

A Blackwell $\epsilon$-optimal strategy in a Markov Decision Process is a strategy that is $\epsilon$-optimal for every discount factor sufficiently close to 1.

We prove the existence of Blackwell $\epsilon$-optimal strategies in finite Markov Decision Processes with partial observation.

#### Article information

Source
Ann. Statist., Volume 30, Number 4 (2002), 1178-1193.

Dates
First available in Project Euclid: 10 September 2002

https://projecteuclid.org/euclid.aos/1031689022

Digital Object Identifier
doi:10.1214/aos/1031689022

Mathematical Reviews number (MathSciNet)
MR1926173

Zentralblatt MATH identifier
1103.90402

#### Citation

Rosenberg, Dinah; Solan, Eilon; Vieille, Nicolas. Blackwell optimality in Markov decision processes with partial observation. Ann. Statist. 30 (2002), no. 4, 1178--1193. doi:10.1214/aos/1031689022. https://projecteuclid.org/euclid.aos/1031689022

#### References

• [1] ALTMAN, E. (2001). Applications of Markov decision processes in communication networks: a survey. In Handbook of Markov Decision Processes: Methods and Applications (E. Feinberg and A. Shwartz, eds.). Kluwer, Boston.
• [2] ARAPOSTATHIS, A., BORKAR, V. S., FERNÁNDEZ-GAUCHERAND, E., GHOSH, M. K. and
• MARCUS, S. I. (1993). Discrete-time controlled Markov processes with average cost criterion: a survey. SIAM J. Control Optim. 31 282-344.
• [3] BLACKWELL, D. (1962). Discrete dy namic programming. Ann. Math. Statist. 33 719-726.
• [4] BORKAR, V. S. (1988). Control of Markov chains with long-run average cost criterion. In Stochastic Differential Sy stems, Stochastic Control Theory and Applications (W. Fleming and P. L. Lions, eds.) 57-77. Springer, Berlin.
• [5] BORKAR, V. S. (1991). Topics in Controlled Markov Chains. Longman, Essex.
• [6] FERNÁNDEZ-GAUCHERAND, E., ARAPOSTATHIS, A. and MARCUS, S. I. (1989). On partially observable Markov decision processes with an average cost criterion. In Proceedings of the 28th IEEE Conference on Decision and Control 1267-1272. IEEE Press, New York.
• [7] KALLENBERG, L. (2001). Finite state and action MDPs. In Handbook of Markov Decision Processes: Methods and Applications (E. Feinberg and A. Shwartz, eds.) 21-30. Kluwer, Boston.
• [8] KUHN, H. W. (1953). Extensive games and the problem of information. In Contributions to the Theory of Games II (H. W. Kuhn and A. W. Tucker, eds.) 193-216. Princeton Univ. Press.
• [9] LANE, D. E. (1989). A partially observable model of decision making by fishermen. Oper. Res. 37 240-254.
• [10] LEHRER, E. and SORIN, S. (1992). A uniform Tauberian theorem in dy namic programming. Math. Oper. Res. 17 303-307.
• [11] MITRA, T., RAY, D. and ROY, R. (1991). The economics of orchards: an exercise in pointinput, flow-output capital theory. J. Econom. Theory 53 12-50.
• [12] MONAHAN, G. E. (1982). A survey of partially observable Markov decision processes: theory, models, and algorithms. Management Sci. 28 1-16.
• [13] PUTERMAN, M. L. (1994). Markov Decision Processes: Discrete Stochastic Dy namic Programming. Wiley, New York.
• [14] RHENIUS, D. (1974). Incomplete information in Markovian decision models. Ann. Statist. 2 1327-1334.
• [15] SAWARAGI, Y. and YOSHIKAWA, T. (1970). Discrete time Markovian decision processes with incomplete state observation. Ann. Math. Statist. 41 78-86.
• [16] SENNOTT, L. I. (1999). Stochastic Dy namic Programming and the Control of Queueing Sy stems. Wiley, New York.
• [17] YUSHKEVICH, A. A. (1976). Reduction of a controlled Markov model with incomplete date to a problem with complete information in the case of Borel state and control spaces. Theory Probab. Appl. 21 153-158.
• EVANSTON, ILLINOIS 60208 AND SCHOOL OF MATHEMATICAL SCIENCES TEL AVIV UNIVERSITY TEL AVIV 69978 ISRAEL E-MAIL: eilons@post.tau.ac.il N. VIEILLE ECOLE POLy TECHNIQUE AND DÉPARTEMENT FINANCE ET ECONOMIE HEC 1, RUE DE LA LIBÉRATION 78 351 JOUY-EN-JOSAS FRANCE E-MAIL: vieille@hec.fr