The Annals of Statistics

Averaging vs. Discounting in Dynamic Programming: a Counterexample

James Flynn
Source: Ann. Statist. Volume 2, Number 2 (1974), 411-413.

Abstract

We consider countable state, finite action dynamic programming problems with bounded rewards. Under Blackwell's optimality criterion, a policy is optimal if it maximizes the expected discounted total return for all values of the discount factor sufficiently close to 1. We give an example where a policy meets that optimality criterion, but is not optimal with respect to Derman's average cost criterion. We also give conditions under which this pathology cannot occur.

First Page: Show Hide
Primary Subjects: 49C15
Secondary Subjects: 62L99, 90C40, 93C55, 60J10, 60J20
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1176342678
JSTOR: links.jstor.org
Digital Object Identifier: doi:10.1214/aos/1176342678
Mathematical Reviews number (MathSciNet): MR368791
Zentralblatt MATH identifier: 0276.49019


2013 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics

Turn MathJax Off
What is MathJax?