The Annals of Mathematical Statistics

Discrete Dynamic Programming

David Blackwell

Full-text: Open access


We consider a system with a finite number $S$ of states $s$, labeled by the integers $1, 2, \cdots, S$. Periodically, say once a day, we observe the current state of the system, and then choose an action $a$ from a finite set $A$ of possible actions. As a joint result of the current state $s$ and the chosen action $a$, two things happen: (1) we receive an immediate income $i(s, a)$ and (2) the system moves to a new state $s'$ with the probability of a particular new state $s'$ given by a function $q = q(s' \mid s, a)$. Finally there is specified a discount factor $\beta, 0 \leqq \beta < 1$, so that the value of unit income $n$ days in the future is $\beta^n$. Our problem is to choose a policy which maximizes our total expected income. This problem, which is an interesting special case of the general dynamic programming problem, has been solved by Howard in his excellent book [3]. The case $\beta = 1$, also studied by Howard, is substantially more difficult. We shall obtain in this case results slightly beyond those of Howard, though still not complete. Our method, which treats $\beta = 1$ as a limiting case of $\beta < 1$, seems rather simpler than Howard's.

Article information

Ann. Math. Statist. Volume 33, Number 2 (1962), 719-726.

First available: 27 April 2007

Permanent link to this document


Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier


Blackwell, David. Discrete Dynamic Programming. The Annals of Mathematical Statistics 33 (1962), no. 2, 719--726. doi:10.1214/aoms/1177704593.

Export citation