Averaging vs. Discounting in Dynamic Programming: a Counterexample
We consider countable state, finite action dynamic programming problems with bounded rewards. Under Blackwell's optimality criterion, a policy is optimal if it maximizes the expected discounted total return for all values of the discount factor sufficiently close to 1. We give an example where a policy meets that optimality criterion, but is not optimal with respect to Derman's average cost criterion. We also give conditions under which this pathology cannot occur.
Permanent link to this document: http://projecteuclid.org/euclid.aos/1176342678
Digital Object Identifier: doi:10.1214/aos/1176342678
Mathematical Reviews number (MathSciNet): MR368791
Zentralblatt MATH identifier: 0276.49019