Statistical Decision Functions

Abraham Wald

doi:10.1214/aoms/1177730030

June, 1949 Statistical Decision Functions

Abraham Wald

Ann. Math. Statist. 20(2): 165-205 (June, 1949). DOI: 10.1214/aoms/1177730030

Abstract

The foundations of a general theory of statistical decision functions, including the classical non-sequential case as well as the sequential case, was discussed by the author in a previous publication [3]. Several assumptions made in [3] appear, however, to be unnecessarily restrictive (see conditions 1-7, pp. 297 in [3]). These assumptions, moreover, are not always fulfilled for statistical problems in their conventional form. In this paper the main results of [3], as well as several new results, are obtained from a considerably weaker set of conditions which are fulfilled for most of the statistical problems treated in the literature. It seemed necessary to abandon most of the methods of proofs used in [3] (particularly those in section 4 of [3]) and to develop the theory from the beginning. To make the present paper self-contained, the basic definitions already given in [3] are briefly restated in section 2.1. In [3] it is postulated (see Condition 3, p. 207) that the space $\Omega$ of all admissible distribution functions $F$ is compact. In problems where the distribution function $F$ is known except for the values of a finite number of parameters, i.e., where $\Omega$ is a parametric class of distribution functions, the compactness condition will usually not be fulfilled if no restrictions are imposed on the possible values of the parameters. For example, if $\Omega$ is the class of all univariate normal distributions with unit variance, $\Omega$ is not compact. It is true that by restricting the parameter space to a bounded and closed subset of the unrestricted space, compactness of $\Omega$ will usually be attained. Since such a restriction of the parameter space can frequently be made in applied problems, the condition of compactness may not be too restrictive from the point of view of practical applications. Nevertheless, it seems highly desirable from the theoretical point of view to eliminate or to weaken the condition of compactness of $\Omega$. This is done in the present paper. The compactness condition is completely omitted in the discrete case (Theorems 2.1-2.5), and replaced by the condition of separability of $\Omega$ in the continuous case (Theorems 3.1-3.4). The latter condition is fulfilled in most of the conventional statistical problems. Another restriction postulated in [3] (Condition 4, p. 297) is the continuity of the weight function $W(F, d)$ in $F$. As explained in section 2.1 of the present paper, the value of $W(F, d)$ is interpreted as the loss suffered when $F$ happens to be the true distribution of the chance variables under consideration and the decision $d$ is made by the statistician. While the assumption of continuity of $W(F, d)$ in $F$ may seem reasonable from the point of view of practical application, it is rather undesirable from the theoretical point of view for the following reasons. It is of considerable theoretical interest to consider simplified weight functions $W(F, d)$ which can take only the values 0 and 1 (the value 0 corresponds to a correct decision, and the value 1 to a wrong decision). Frequently, such weight functions are necessarily discontinuous. Consider, for example, the problem of testing the hypothesis $H$ that the mean $\theta$ of a normally distributed chance variable $X$ with unit variance is equal to zero. Let $d_1$ denote the decision to accept $H$, and $d_2$ the decision to reject $H$. Assigning the value zero to the weight $W$ whenever a correct decision is made, and the value 1 whenever a wrong decision is made, we have: $W(\theta, d_1) = 0 \text{for} \theta = 0, \text{and} = 1 \text{for} \theta \neq 0; W (\theta, d_2) = 0 \text{for} \theta \neq 0, \text{and} = 1 \text{for} \theta = 0.$ This weight function is obviously discontinuous. In the present paper the main results (Theorems 2.1-2.5 and Theorems 3.1-3.4) are obtained without making any continuity assumption regarding $W(F, d)$. The restrictions imposed in the present paper on the cost function of experimentation are considerably weaker than those formulated in [3]. Condition 5 [3, p. 297] concerning the class $\Omega$ of admissible distribution functions, and condition 7 [3, p. 298] concerning the class of decision functions at the disposal of the statistician are omitted here altogether. One of the new results obtained here is the establishment of the existence of so called minimax solutions under rather weak conditions (Theorems 2.3 and 3.2). This result is a simple consequence of two lemmas (Lemmas 2.4 and 3.3) which seem to be of interest in themselves. The present paper consists of three sections. In the first section several theorems are given concerning zero sum two person games which go somewhat beyond previously published results. The results in section 1 are then applied to statistical decision functions in sections 2 and 3. Section 2 treats the case of discrete chance variables, while section 3 deals with the continuous case. The two cases have been treated separately, since the author was not able to find any simple and convenient way of combining them into a single more general theory.