The Annals of Mathematical Statistics
- Ann. Math. Statist.
- Volume 33, Number 2 (1962), 719-726.
Discrete Dynamic Programming
We consider a system with a finite number $S$ of states $s$, labeled by the integers $1, 2, \cdots, S$. Periodically, say once a day, we observe the current state of the system, and then choose an action $a$ from a finite set $A$ of possible actions. As a joint result of the current state $s$ and the chosen action $a$, two things happen: (1) we receive an immediate income $i(s, a)$ and (2) the system moves to a new state $s'$ with the probability of a particular new state $s'$ given by a function $q = q(s' \mid s, a)$. Finally there is specified a discount factor $\beta, 0 \leqq \beta < 1$, so that the value of unit income $n$ days in the future is $\beta^n$. Our problem is to choose a policy which maximizes our total expected income. This problem, which is an interesting special case of the general dynamic programming problem, has been solved by Howard in his excellent book . The case $\beta = 1$, also studied by Howard, is substantially more difficult. We shall obtain in this case results slightly beyond those of Howard, though still not complete. Our method, which treats $\beta = 1$ as a limiting case of $\beta < 1$, seems rather simpler than Howard's.
Ann. Math. Statist. Volume 33, Number 2 (1962), 719-726.
First available in Project Euclid: 27 April 2007
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Blackwell, David. Discrete Dynamic Programming. Ann. Math. Statist. 33 (1962), no. 2, 719--726. doi:10.1214/aoms/1177704593. https://projecteuclid.org/euclid.aoms/1177704593