The Annals of Statistics

Averaging vs. Discounting in Dynamic Programming: a Counterexample

James Flynn

Full-text: Open access

Abstract

We consider countable state, finite action dynamic programming problems with bounded rewards. Under Blackwell's optimality criterion, a policy is optimal if it maximizes the expected discounted total return for all values of the discount factor sufficiently close to 1. We give an example where a policy meets that optimality criterion, but is not optimal with respect to Derman's average cost criterion. We also give conditions under which this pathology cannot occur.

Article information

Source
Ann. Statist., Volume 2, Number 2 (1974), 411-413.

Dates
First available in Project Euclid: 12 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1176342678

Digital Object Identifier
doi:10.1214/aos/1176342678

Mathematical Reviews number (MathSciNet)
MR368791

Zentralblatt MATH identifier
0276.49019

JSTOR
links.jstor.org

Subjects
Primary: 49C15
Secondary: 62L99: None of the above, but in this section 90C40: Markov and semi-Markov decision processes 93C55: Discrete-time systems 60J10: Markov chains (discrete-time Markov processes on discrete state spaces) 60J20: Applications of Markov chains and discrete-time Markov processes on general state spaces (social mobility, learning theory, industrial processes, etc.) [See also 90B30, 91D10, 91D35, 91E40]

Keywords
Dynamic programming average cost criteria discounting Markov decision process

Citation

Flynn, James. Averaging vs. Discounting in Dynamic Programming: a Counterexample. Ann. Statist. 2 (1974), no. 2, 411--413. doi:10.1214/aos/1176342678. https://projecteuclid.org/euclid.aos/1176342678


Export citation