A Note on Memoryless Rules for Controlling Sequential Control Processes

Cyrus Derman; Ralph E. Strauch

doi:10.1214/aoms/1177699618

February, 1966 A Note on Memoryless Rules for Controlling Sequential Control Processes

Cyrus Derman, Ralph E. Strauch

Ann. Math. Statist. 37(1): 276-278 (February, 1966). DOI: 10.1214/aoms/1177699618

Abstract

We are concerned with a dynamic system which at times $t = 0, 1, \cdots$ is observed to be in one of a finite number of states. We shall denote the space of possible states by $I$. After each observation the system is "controlled" by making one of a finite number of possible decisions. We shall denote by $K_i$ the set of possible decisions when the system is in state $i, i \varepsilon I$. Let $\{Y_t\}, t = 0, 1, \cdots$, denote the sequence of observed states and $\{\Delta_t\}, t = 0, 1, \cdots$, the sequence of observed decisions. The fundamental assumption regarding $\{Y_t, \Delta_t\}, t = 0, 1, \cdots$, is \begin{equation*}\tag{A}P(Y_{t+1} = j \mid Y_0, \Delta_{0, \cdots,t}Y_t = i, \Delta_t = k) = q_{ij}(k),\quad t = 0, 1, \cdots; j \varepsilon I; k \varepsilon K_i\end{equation*} where the $q_{ij}(k)$'s are non-negative numbers satisfying $\sum_j q_{ij}(k) = 1, k \varepsilon K_i; i \varepsilon I$. A rule for making successive decisions can be summarized in the form of a collection of non-negative functions $D_k(Y_0, \Delta_0, \cdots, \Delta_{t-1}, Y_t),\quad t = 0, 1, \cdots; k \varepsilon K_{Y_t},$ where in every case $\sum_k D_k(\cdot) = 1$. We set $P(\Delta_t = k \mid Y_0, \Delta_0, \cdots, \Delta_{t-1}, Y_t) = D_k(Y_0, D_0, \cdots, \Delta_{t-1}, Y_t)$ for $t = 0, 1, \cdots$. Thus, given $Y_0 = i$ and any rule $R$ for making successive decisions, the sequence $\{Y_t, \Delta_t\}, t = 0, 1, \cdots$, is a stochastic process with its probability measure dependent upon the rule $R$. We refer to such a process as a sequential control process. Let $C$ denote the class of all possible rules; $C'$ denote the class of all rules such that $D_k(Y_0, \Delta_0, \cdots, \Delta_{t-1}, Y_t = i) = D_{ik},\quad t = 0, 1, \cdots; k \varepsilon K_i; i \varepsilon I$. That is, $C'$ is the class of all rules such that the mechanism for making a decision at any time $t$ is dependent only on the state of the system at time $t$. A rule $R \varepsilon C'$ has a stationary Markovian character and, indeed, when $R \varepsilon C'$ is used, the resulting process $\{Y_t\}, t = 0, 1, \cdots$, is a Markov chain with stationary transition probabilities. We let $C''$ denote the subclass of $C'$ where the $D_{ik}$'s are zero or one. Rules in $C'$ allow for randomization; the rules in $C''$ are non-randomized. For a given $R \varepsilon C$ and initial state $Y_0 = i$, let $X_{T,j,k,R}(i) = (T + 1)^{-1}\sum^T_{t=0} P_R(Y_t = j, \Delta_t = k \mid Y_0 = i)$ and let $X_{T,R}(i)$ denote the vector of components $X_{T,j,k,R}(i)$ for all $k \varepsilon K_j$ and $j \varepsilon I$. Denote by $H_R(i)$ the set of limit points of $X_{T,R}(i)$ as $T \rightarrow \infty$. Let $H(i) = \mathbf{\bigcup}_{R\varepsilon C} H_R(i),\quad H'(i) = \mathbf{\bigcup}_{R\varepsilon C'} H_R(i),\quad H''(i) = \mathbf{\bigcup}_{R\varepsilon C''} H_R(i);$ and let $\bar{H}'(i)$ and $\bar{H}''(i)$ denote the convex hulls of $H'(i)$ and $H''(i)$, respectively. In [5] was proved Theorem 1. (a) $\bar H'(i) = \bar H''(i) \supset H(i)$. (b) If the Markov chain corresponding to $R$ is irreducible for every $R \varepsilon C''$, then $\bar H''(i) = H'(i) = H(i) = \mathbf{\bigcup}_{i\varepsilon I} H(i)$. Examples were given in [4] and [5] showing that $H(i)$ can be larger than $H'(i)$. In (b) the irreducibility assumption can be weakened to the condition that for each $R \varepsilon C''$ the corresponding Markov chain has, at most, one ergodic class. Blackwell [1], [2], and Maitra [6] have considered memoryless rules. By a memoryless rule we mean a rule $R$ such that $D_t(Y_0, \Delta_0, \cdots, \Delta_{t-1}, Y_t = i) = D^{(t)}_{ik}\quad t = 0, 1, \cdots, k \varepsilon K_i, i \varepsilon I.$ That is, with a memoryless rule the mechanism for making a decision is a function of the time $t$ and the state $i$ at time $t$. The memoryless rules of Blackwell and Maitra are non-randomized; i.e., $D^{(t)}_{ik} = 0$ or 1. We shall let $C^M$ denote the class of memoryless rules (both randomized and non-randomized). Thus $C \supset C^M \supset C' \supset C''$. If $R \varepsilon C^M - C'$, then $\{Y_t\}, t = 0, 1, \cdots$, is a finite state Markov chain with non-stationary transition probabilities. We remark that it is the memoryless rules (non-randomized) that are considered the rules of interest in the usual finite horizon dynamic programming problems. See Blackwell [2] for interesting remarks along these lines. We are concerned with optimization problems where the criterion to be optimized are functions of the points in $H(i)$. In [5] it was shown that one can construct problems where the optimal rule is in $C - C'$. This can occur, e.g., if the criterion to be optimized is a linear functional over the points in $H(i)$ but where a solution must also satisfy one or more linear constraints in $H(i)$. It is for the purpose of treating optimization problems of this kind that we are interested in the limit points of $X_{T,R}(i)$ for rules belonging to the various important sub-classes of $C$. Let $H^M(i) = \mathbf{\bigcup}_{R\varepsilon C^M} H_R(i)$. The result of this note (which is similar to Theorem 4.1 of [7]) is Theorem 2. $H(i) = H^M(i).$ In fact, for any $R \varepsilon C$ there exists an $R_0 \varepsilon C^M$ such that $X_{tR_0}(i) = X_{tR}(i)$ for all $t$. Proof. Define $R_0$ by $ D_{jk}(t) = P_{R_0}(\Delta_t = k \mid Y_t = j) = P_R(\Delta_t = k \mid Y_t = j, Y_0 = i).$ It is enough to show that for all $t, k \varepsilon K_j$ and $j \varepsilon I$, \begin{equation*} \tag{*} P_{R_0}(Y_t = j, \Delta_t = k \ Y_0 = i) = P_R(Y_t = j, \Delta_t = k \mid Y_0 = i).\end{equation*} The relation $(\ast)$ holds for $t = 0$, since $P_R(Y_0 = j, \Delta_0 = k \mid Y_0 = i) = 0 = P_{R_0}(Y_0 = j, \Delta_0 \mid Y_0 = i)$ if $j \neq i$ and \begin{align*}P_R(Y_0 = i, \Delta_0 = k \mid Y_0 = i) &= P_R(\Delta_0 = k \mid Y_0 = i) \\ &= P_{R_0} (\Delta_0 = k \mid Y_0 = i)\end{align*} by definition. Now assume $(\ast)$ holds for $t = 0, \cdots, T - 1$. Then $P_R(Y_T = j,\Delta_T = k \mid Y_0 = i) = P_R(Y_T = j \mid Y_0 = i)P_R(\Delta_T = k \mid Y_T = j, Y_0 = i)$ but $P_R(\Delta_T = k \mid Y_T = j, Y_0 = i) = D_{jk}(t)$ by definition, and \begin{align*}P_R(Y_T = j \mid Y_0 = i &= \sum_{l\varepsilon I} \sum_{k\varepsilon K_l} P_R(Y_{T-1} = l, \Delta_{T-1} = k \mid Y_0 = i) q_{lj}(k) \\ &= \sum_{l\varepsilon I} \sum_{k\varepsilon K_l} P_{R_0}(Y_{T-1} = l, \Delta_{T-1} = k \mid Y_0 = i) q_{lj}(k) \\ &= P_{R_0}(Y_T = j \mid Y_0 = i)\end{align*} by the introduction hypothesis. Thus \begin{align*}P_R(Y_T = j, \Delta_T = k \mid Y_0 = i) &= P_{R_0}(Y_T = j \mid Y_0 = i)D_{jk}(t) \\ &= P_{R_0}(Y_T = j, \Delta_T = k \mid Y_0 = i).\end{align*} This completes the proof.

Citation

Download Citation

Cyrus Derman. Ralph E. Strauch. "A Note on Memoryless Rules for Controlling Sequential Control Processes." Ann. Math. Statist. 37 (1) 276 - 278, February, 1966. https://doi.org/10.1214/aoms/1177699618