This paper examines the relationships between optimality criteria which are commonly used for undiscounted, discrete-time, countable state Markovian decision models. One approach, due to Blackwell, is to maximize the expected discounted total return as the discount factor approaches 1. Another, due to Veinott, is to maximize the Cesaro means of the finite horizon expected returns as the horizon tends to infinity. Derman's is to maximize the long-run average gain. Denardo, Miller and Lippman showed that Blackwell's and Veinott's approaches are equivalent for finite state and action spaces. As shown here, that equivalence breaks down when the state space is countable. Also, policies optimal according to Blackwell's or Veinott's approach need not be optimal according to Derman's. On the positive side, fairly weak conditions are given under which Blackwell's and Veinott's criteria imply Derman's, and somewhat stronger conditions under which Blackwell's and Veinott's criteria are equivalent.
"Conditions for the Equivalence of Optimality Criteria in Dynamic Programming." Ann. Statist. 4 (5) 936 - 953, September, 1976. https://doi.org/10.1214/aos/1176343590