## The Annals of Mathematical Statistics

- Ann. Math. Statist.
- Volume 33, Number 2 (1962), 719-726.

### Discrete Dynamic Programming

#### Abstract

We consider a system with a finite number $S$ of states $s$, labeled by the integers $1, 2, \cdots, S$. Periodically, say once a day, we observe the current state of the system, and then choose an action $a$ from a finite set $A$ of possible actions. As a joint result of the current state $s$ and the chosen action $a$, two things happen: (1) we receive an immediate income $i(s, a)$ and (2) the system moves to a new state $s'$ with the probability of a particular new state $s'$ given by a function $q = q(s' \mid s, a)$. Finally there is specified a discount factor $\beta, 0 \leqq \beta < 1$, so that the value of unit income $n$ days in the future is $\beta^n$. Our problem is to choose a policy which maximizes our total expected income. This problem, which is an interesting special case of the general dynamic programming problem, has been solved by Howard in his excellent book [3]. The case $\beta = 1$, also studied by Howard, is substantially more difficult. We shall obtain in this case results slightly beyond those of Howard, though still not complete. Our method, which treats $\beta = 1$ as a limiting case of $\beta < 1$, seems rather simpler than Howard's.

#### Article information

**Source**

Ann. Math. Statist. Volume 33, Number 2 (1962), 719-726.

**Dates**

First available in Project Euclid: 27 April 2007

**Permanent link to this document**

https://projecteuclid.org/euclid.aoms/1177704593

**Digital Object Identifier**

doi:10.1214/aoms/1177704593

**Mathematical Reviews number (MathSciNet)**

MR149965

**Zentralblatt MATH identifier**

0133.12906

**JSTOR**

links.jstor.org

#### Citation

Blackwell, David. Discrete Dynamic Programming. Ann. Math. Statist. 33 (1962), no. 2, 719--726. doi:10.1214/aoms/1177704593. https://projecteuclid.org/euclid.aoms/1177704593