## Abstract

The problem of sequential selection of experiments, with fixed and optional stopping, is considered. Conditions are given which allow selection, stopping and terminal action rules to be based on a sequence $\{T_j\}$ of statistics, where $T_j$ is a function of past observations $\mathbf{X}^j = (X_1, \cdots, X_j)$ and experiment selections $\mathbf{E}^j = (E_1, \cdots, E_j)$. Randomized stopping, selection, and terminal action rules are allowed, and all probability distributions are defined by densities relative to $\sigma$-finite measures over Euclidean spaces. Here we give a heuristic description of the principal results for the case of optional stopping. At each time $j$ the random variable $X_j$ is observed and a decision is made to stop or continue. If the procedure is stopped, a terminal action $A$ is taken. If it is continued, an experiment $E_{j+1}$, to be performed at time $j + 1$, is chosen. At time $j$, all decisions are based on $\mathbf{X}^j,\mathbf{E}^j$, the past observations and experiment selections. Upon stopping, and taking action $A$, a loss $L(\theta, A)$, where $\theta$ is the unknown state of nature, is incurred. The sampling cost of stopping at $j$ is $C_j(\theta, \mathbf{X}^j, \mathbf{E}^j)$. Let the random variable $N$ denote the random stopping time. A selection rule $\gamma = (\gamma_0, \gamma_1, \cdots)$ is defined by the sequence of conditional densities $\gamma_j(e_{j+1}\mid\mathbf{x}^j, \mathbf{e}^j)$, a stopping rule $(\mathbb{\Phi} = (\phi_0, \phi_1, \cdots)$ by the probabilities $\phi_j(\mathbf{x}^j,\mathbf{e}^j) = P\{N = j\mid N \geqq j, \mathbf{x}^j,\mathbf{e}^j\}$, and a terminal action rule $\delta = (\delta_0, \delta_1, \cdots)$ by the conditional densities $\delta_j(a\mid\mathbf{x}^j,\mathbf{e}^j)$. Definition of the population densities $f_\theta(x_{j+1}\mid\mathbf{x}^j, \mathbf{e}^{j+1})$ for $j = 0, 1, 2, \cdots$ completely fixes the probability structure. Define $\{T_j\}$ to be parameter sufficient (PARS) if, for $j = 0, 1, 2, \cdots, \operatorname{Dist}_{\theta,\gamma}(\mathbf{X}^j, \mathbf{E}^j\mid T_j)$ is independent of $\theta$ for all $\gamma$ and policy sufficient (POLS) if, for $j = 0, 1, 2, \cdots, \operatorname{Dist}_{\theta,\Phi,\gamma} (T_{j+1}\mid T_j, E_{j+1}, N \geqq j + 1)$ is independent of $\mathbf{\phi}, \mathbf{\gamma}$ for all $\theta$. THEOREM. If $\{T_j\}$ is PARS; then the class of policies $\{\mathbf{\phi}, \mathbf{\gamma}, \mathbf{\delta}^0\}$, where $\delta^0$ is based on $\{T_j\}$, is essentially complete. THEOREM. If $\{T_j\}$ is PARS and POLS, and the sampling cost is of the form $C_j(\theta, T_j)$, then the class of policies $\{\mathbf{\Phi}^0, \mathbf{\gamma}^0, \mathbf{\delta}^0\}$, where $\mathbf{\phi}^0, \mathbf{\gamma}^0, \mathbf{\delta}^0$ are based on $\{T_j\}$, is essentially complete. Conditions are given to aid in the verification of PARS and POLS. The theorems are applied to examples, including versions of the two armed bandit problem.

## Citation

K. B. Gray Jr.. "Sequential Selection of Experiments." Ann. Math. Statist. 39 (6) 1953 - 1977, December, 1968. https://doi.org/10.1214/aoms/1177698025

## Information