## The Annals of Mathematical Statistics

### Discrete Dynamic Programming

David Blackwell

#### Abstract

We consider a system with a finite number $S$ of states $s$, labeled by the integers $1, 2, \cdots, S$. Periodically, say once a day, we observe the current state of the system, and then choose an action $a$ from a finite set $A$ of possible actions. As a joint result of the current state $s$ and the chosen action $a$, two things happen: (1) we receive an immediate income $i(s, a)$ and (2) the system moves to a new state $s'$ with the probability of a particular new state $s'$ given by a function $q = q(s' \mid s, a)$. Finally there is specified a discount factor $\beta, 0 \leqq \beta < 1$, so that the value of unit income $n$ days in the future is $\beta^n$. Our problem is to choose a policy which maximizes our total expected income. This problem, which is an interesting special case of the general dynamic programming problem, has been solved by Howard in his excellent book [3]. The case $\beta = 1$, also studied by Howard, is substantially more difficult. We shall obtain in this case results slightly beyond those of Howard, though still not complete. Our method, which treats $\beta = 1$ as a limiting case of $\beta < 1$, seems rather simpler than Howard's.

#### Article information

Source
Ann. Math. Statist. Volume 33, Number 2 (1962), 719-726.

Dates
First available in Project Euclid: 27 April 2007

Permanent link to this document
http://projecteuclid.org/euclid.aoms/1177704593

Digital Object Identifier
doi:10.1214/aoms/1177704593

Mathematical Reviews number (MathSciNet)
MR149965

Zentralblatt MATH identifier
0133.12906

JSTOR