## Abstract

A sequential control process is a dynamic system which is observed periodically and classified into one of a number of possible states. After each observation one of a number of possible decisions is made. These decisions are the "control"; they determine the chance laws of the system. A replacement process is a control process with an additional special action, called replacement, which instantaneously returns the system to some initial state. Let $\mathbf{X}$ denote the state space of the system, assumed to be a Borel subset of finite dimensional Euclidean space. The case where $\mathbf{X}$ is finite has been treated by Derman [8], and thus $\mathbf{X}$ is considered infinite here. Let $\mathscr{B}$ be the $\sigma$-algebra of Borel sets in $\mathbf{X}$. Let $\{X_t; t = 0, 1, 2, \cdots\}$ be the sequence of states and $\{\Delta_t; t = 0, 1, 2, \cdots\}$ be the sequence of decisions. In a replacement problem it is assumed that there is a distinguished state $x_0 \epsilon \mathbf{X}$ with $X_0 = x_0$ with probability one. For any time $t$ let $S_t$ be the history of states and decisions up to and including time $t$. Let $\mathbf{A}$ be the set of possible actions, excluding replacement, where: $A1^\circ$ It is assumed that the action space $\mathbf{A}$ is a finite set with $n((\mathbf{A})$ elements. Since $\mathbf{A}$ is finite, assume $\mathbf{A} = \{1, 2, \cdots, n(\mathbf{A})\}$. Let $k_0 \not\varepsilon \mathbf{A}$ denote the replacement action. The action $k_0$ instantaneously returns the system to state $x_0$, and it may be followed by some action $k \varepsilon \mathbf{A}$ which "acts on" the state $x_0$. The pair $(k_0, k)$ itself constitutes a possible action. A decision at time $t$ is either a choice of an element $k \varepsilon \mathbf{A}$ or a choice of a pair $(k_0, k)$ with $k \varepsilon \mathbf{A}$. Let $\mathbf{A}_0$ be the total action space, where: $\mathbf{A}_0 = \mathbf{A} \cup \{ (k_0, k); k \varepsilon \mathbf{A}\}.$ There are $2n(\mathbf{A})$ elements in $\mathbf{A}_0$. Let $\Xi = \{\xi; \xi = \langle\xi_1, \cdots, \xi_{2n(\mathbf{A})}\rangle, \xi_j \geqq 0, \sum\xi_j = 1\}$ be the simplex of all probability distributions on $\mathbf{A}_0$. A sequential control rule is a function $D(s_{t - 1}, x) = \langle D_1(s_{t - 1}, x), \cdots, D_{2n(\mathbf{A})} (s_{t - 1}, x)\rangle$ of histories $s_{t - 1}$ and present states $x$ with values in $\Xi$. The interpretation is: At a history of $S_{t - 1} = s_{t - 1}$ and a present state $X_t = x$, decision $j \varepsilon \mathbf{A}_0$ is taken with probability $D_j(s_{t - 1}, x)$. In order that the integrals later to be written have meaning it is necessary to restrict attention to control rules $D(s_{t - 1}, x)$ which are Biare functions of their arguments. Let $\mathbf{R}$ be the space of all such control rules. A sequential control process is not specified until a "law of motion" is given. $A2^\circ$ It is assumed that for every $x \varepsilon \mathbf{X}$ and $k \varepsilon \mathbf{A}$ there exists a probability measure $Q(\cdot; x, k)$ on $\mathscr{B}$ such that for some version $\mathrm{Pr}\{X_{t + 1} \varepsilon B \mid S_{t - 1}, X_t = x, \Delta_t = k\} = Q(B; x, k)$; for every $B \varepsilon \mathscr{B}$ and history $S_{t - 1}$. For every $B \varepsilon \mathscr{B}$ and $k \varepsilon \mathbf{A}, Q(B; \cdot, k)$ is assumed to be a Baire function on $\mathbf{X}$. It is assumed that $Q(\cdot, x, k)$ is absolutely continuous with respect to some $\sigma$-finite measure $\mu$ on $\mathscr{B}$, and possessing a density $q(\cdot, x, k)$, also assumed to be a Baire function in $x$. Since $X_0 = x_0$ a.s., once a rule $R \varepsilon \mathbf{R}$ is specified, the sequences $\{X_t, t = 0, 1, 2, \cdots\}$ and $\{(X_t, \Delta_t); t = 0, 1, 2, \cdots\}$ are stochastic processes. The previous Assumption $A2^\circ$ imposes a structure similar to that of a Markov process in that the law of motion does not depend on the past history, but only on the present state. In a manner similar to Derman [9], the process $\{(X_t, \Delta_t); t = 0, 1, 2, \cdots\}$ will be called a Markovian sequential replacement process. It is not true that $\{X_t; t = 0, 1, \cdots\}$ nor even $\{(X_t, \Delta_t); t = 0, 1, \cdots\}$ will always be Markov processes; whether they are or not will depend on the rule $R$. Two assumptions particular to the development in this paper and insuring the ergodicity of the process are: $A3^\circ$ For every $x \varepsilon \mathbf{X}$ and $k \varepsilon \mathbf{A}$ it is assumed that $\lim_{x' \rightarrow x} \int |q(y; x, k) - q(y; x', k)| \mu(dy) = 0.$ $A4^\circ$ For every compact set $G \subset \mathbf{X}$ it is assumed that $\sup_{x \varepsilon G}\int_G q(y; x, k) \mu (dy) < 1$ for all $k \varepsilon \mathbf{A}$. The last assumption, $A4^\circ$, is stronger than needed, as may be seen in the examples in Section 4. However, it is easily verified and seems natural in many applications of the theory. Let $w(x, k)$ be the immediate cost whenever the system is in state $x \varepsilon \mathbf{X}$ and decision $k \varepsilon \mathbf{A}$ is made. It often occurs that the cost in an actual situation is a random variable whose distribution is determined by knowledge of the state and decision. In such a case, with some loss in generality, attention is restricted to $w(x, k)$ representing the expected one stage cost under the appropriate distribution. Let $K(x)$ be the cost of replacing a system in state $x$. If $w_0(\cdot, \cdot)$ is the cost function defined on $\mathbf{X} \times \mathbf{A}_0$ then the relationship is: $w_0(x, k) = w(x, k)\quad\text{for} k \neq k_0$ and $w_0(x, (k_0, k)) = K(x) + w(x_0, k)\quad\text{for} k \varepsilon \mathbf{A}.$ $A5^\circ$ Assume that $K(\cdot)$ is bounded and continuous with $0 \leqq K(x) \leqq M$ for all $x \varepsilon \mathbf{X}$. For every $k \varepsilon \mathbf{A}$ assume that $w(\cdot, k)$ is a non-negative continuous function on $\mathbf{X}$ with $\lim \inf_{x \rightarrow \infty} w(x, k) \gg 0$ (For the limiting operation here, a neighborhood of $\infty$ is the complement of a compact set.). The notation $a \gg 0$ means that $a$ is much greater than zero, but not necessarily infinite. One needs $\lim \inf_{x \rightarrow \infty} w(x, k)$ large enough so that Lemmas 3.2, 3.3 and 3.4 will hold. Intuitively, one needs the cost of continuing sufficiently large for some states so as to ensure that the expected time to a replacement action is finite. It should be noted that $\sup_{x \varepsilon X} \min_{a \varepsilon \mathbf{A}_0} w_0(x, a) \leqq M_0$ where $M_0 = M + \min_{k \varepsilon \mathbf{A}} w(x_0, k)$. Let $P_t(B, a \mid x, R) = \mathrm{Pr}\{X_t \varepsilon B, \Delta_t = a \mid X_0 = x, R\}$ for $B \varepsilon \mathscr{B}, x \varepsilon X$ and $a \varepsilon \mathbf{A}_0$. Let the appropriate density be labeled $p_t(\cdot, \cdot \mid x, R)$ where $p_t(y, a \mid x, R)\mu(dy) = \mathrm{Pr}\{X_t \varepsilon dy, \Delta_t = a \mid X_0 = x, R\}.$ Two common measures of effectiveness of a Markovian sequential decision process are the expected total discounted future cost and the average cost per unit time. The first, abbreviated to "discounted cost" assumes a discount factor $\alpha \varepsilon (0, 1)$, with the interpretation that a unit of value $n$ periods hence has a present value of $\alpha^n$. For a starting state of $X_0 = x_0$ the objective is to choose a rule $R$ so as to minimize $\psi(x_0, \alpha, R) = \sum^\infty_{t = 0} \alpha^t \int x\sum_{a \varepsilon \mathbf{A}_0} w_0(x, a)p_t(x, a \mid x_0, R)\mu(dx).$ The second criterion, abbreviated to "average cost" examines the function $\varphi(x_0, R) = \lim \inf_{T \rightarrow \infty} T^{-1} \sum^{T - 1}_{t = 0} \int_{\mathbf{X}} \sum_{a \varepsilon \mathbf{A}_0} w_0(x, a)p_t(x, a \mid x_0, R)\mu(dx)$ Section 2 presents the solution of the problem under the discounted cost measure. Building upon the work of Blackwell [4] and Karlin [12], Derman [9] has shown that an optimal non-randomized stationary rule exists for the case where $\mathbf{X}$ is denumerable. Blackwell [5] recently has given a complete discussion of the general use. The rule is characterized by a functional equation of the dynamic programming type. Iterative methods for solving such functional equations are now almost commonplace. Section 3 uses the known results in the discounted cost model: (a) to show the existence of a non-randomized stationary solution in the average cost case, (b) to show the existence of a functional equation characterizing the solution in the average cost case, and (c) to show that the average cost solution is the limit, in some sense, of the discounted cost solutions as the discount factor approaches unity. Section 4 presents some applications of the theory. The attempt is to show how the work of several authors fits into this general theory of control of replacement processes. For example, while supporting one claim in a quality control paper by Girshick and Rubin [10], the theory also provides a counter example for another of their claims.

## Citation

Howard M. Taylor III. "Markovian Sequential Replacement Processes." Ann. Math. Statist. 36 (6) 1677 - 1694, December, 1965. https://doi.org/10.1214/aoms/1177699796

## Information