## Abstract

The problem of finding Bayes solutions for sequential experimental design problems motivates the study of the following type of one-person sequential game. If the game is stopped at any stage $a$, the loss to the player is the value of a random variable (rv), say $Z_a$. If the player chooses to continue the game, he can select the next rv to be observed from a class of rv's available at that stage, thus bringing the game to one of the stages succeeding stage $a$. (The class of all stages can be pictured as a "tree".) At this stage the player can again choose to stop, accepting the value of the chosen rv as his loss, or he may continue by selecting one of the class of rv's now available for the next observation. The player is required to stop sometime, and his decisions at any stage must depend only on information available at that stage. A model for this situation is given in Section 4. Control variables, which correspond to stopping variables in the usual formulation of sequential games, are defined which can be used by the player to decide whether to stop or not at any stage and, if he continues, which rv to observe next. A general characterization of control variables that minimize expected loss is given, and existence of such optimal control variables is proved under conditions applicable to statistical problems. The application to finding Bayes solutions to sequential experimental design problems is given in Section 5. As a preliminary to the discussion on control variables, Section 3 provides a study of the theory of optimal stopping variables. Let $\{Z_n, F_n, n \geqq 1\}$ be a stochastic process on a probability space $(\Omega, F, P)$ with points $\omega$. A stopping variable (sv) is a rv $t$ with values in $\{1, 2, \cdots, \infty\}$ such that $t < \infty$ a.e. and $\{t = n\} \varepsilon F_n$ for each $n$. For any such sv $t$, a rv $Z_t$ is defined by \begin{align*}Z_t(\omega) &= Z_n(\omega),\quad\text{if} t(\omega) = n, \\ &= \infty,\quad\text{if} t(\omega) = \infty.\end{align*} It is convenient to think of $Z_n$ as the loss after $n$ plays in a one-person sequential game and to consider the $\sigma$-field $F_n$ as representing the knowledge of the past after $n$ plays. The problem of finding a strategy for stopping the game to minimize the expected loss corresponds to finding a minimizing sv, i.e., one which minimizes $EZ_t$ among the class of all sv's $t$. The main results in Section 3 are new characterizations of Snell's solution in [12] to the problem of optimal stopping which generalized the well-known Arrow-Blackwell-Girshick theory in [1]. Under the assumption that there is an integrable rv $U$ such that $Z_n \geqq U$ a.e. for each $n$, Snell showed that when a minimizing sv $t^\ast$ exists, it can be defined as the first integer $j$ such that $X_j = Z_j$ (or $\infty$ if no such integer exists) where $\{X_n, F_n, n \geqq 1\}$ is the maximal regular generalized submartingale relative to $\{Z_n, F_n, n \geqq 1\}$. Under the additional assumption that $\{Z_n, n \geqq 1\}$ is an integrable sequence, we now show that the rv's $X_n$ can be further identified as $X_n = \operatorname{ess} \inf_{t \varepsilon T_n} E(Z_t \mid F_n) \text{a.e.},$ where $T_n$ is the class of sv's $t$ such that $t \geqq n$. It follows that if there is a set $A_n$ in $F_n$ for each $n$ such that (i) $Z_n \leqq E(Z_t \mid F_n)$ a.e. on $A_n$ for each $t$ in $T_{n + 1}$, and (ii) $Z_n > \operatorname{ess} \inf_{t \varepsilon T_{n + 1}} E(Z_t \mid F_n)$ a.e. on the complement $A'_n$, then the minimizing sv $t^\ast$ defined above is such that, for almost all points $\omega, t^\ast(\omega) = n$ if and only if $\omega \varepsilon A_n - \mathbf\bigcup^{n - 1}_{i = 1} A_i$. This comes very close to stating that the player of the sequential game above should stop playing after $n$ plays if and only if there is no continuation (i.e., sv in $T_{n + 1}$) having conditional expected loss given the past which is less than the present loss $Z_n(\omega)$. Unfortunately, such an interpretation is not valid in general since it is equivalent to stopping the game after $n$ plays if and only if $Z_n(\omega) \leqq \inf_{t \varepsilon T_{n + 1}} E(Z_t \mid F_n)(\omega).$ The function $\inf_{t \varepsilon T_{n + 1}} E(Z_t \mid F_n)$ not only may not be measurable, but also in many cases it may be changed almost at will by choosing different versions of the conditional expectations involved. On the other hand, the interpretation above is valid, as is shown in Section 3, for the case where each $\sigma$-field $F_n$ is generated by finitely many discrete rv's. Section 3 also contains a reasonably general development of the theory of minimizing sv's, including a more direct proof of Snell's result using the definition $X_n = \operatorname{ess} \inf_{t \varepsilon T_n} E(Z_t \mid F_n)$. Illustrations are given and comparisons are made with the approaches used by Arrow, Blackwell, and Girshick in [1] and Chow and Robbins in [4] and [5].

## Citation

Gus W. Haggstrom. "Optimal Stopping and Experimental Design." Ann. Math. Statist. 37 (1) 7 - 29, February, 1966. https://doi.org/10.1214/aoms/1177699594

## Information