## Abstract

Let $\pi_1, \pi_2$ be normal populations with means $m_1, m_2$ respectively and a common variance $\sigma^2$, the parameter point $\omega = (m_1, m_2:\sigma)$ which characterizes the two populations being unknown, and let $\Omega$ be an arbitrary given set of possible points $\omega$. Random samples of fixed sizes $n_1, n_2$ are drawn from $\pi_1, \pi_2$ respectively, giving the combined sample point $\nu = (x_{11}, x_{12}, \cdots, x_{1n_1}; x_{21}, x_{22}, \cdots, x_{2n_2})$. For reasons which will be made clear later in connection with practical examples, any function $f(\nu)$ such that $0 \leq f(\nu) \leq 1$ is called a decision function, and for any such $f(\nu)$ the risk function is defined to be \begin{equation*}\tag{(1)} r(f\|\omega) = \max \lbrack m_1, m_2\rbrack - m_1E\lbrack f\mid \omega\rbrack - m_2E\lbrack 1 - f\|\omega\rbrack \geq 0\end{equation*} where $E$ denotes the expectation operator. A decision function $\bar{f}(\nu)$ is said to be (a) uniformly better than $f(\nu)$ if $r(\bar{f} \| \omega) \leq r(f\| \omega)$ for all $\omega$ in $\Omega$, the strict inequality holding for at least one $\omega$, (b) admissible if no decision function is uniformly better than $\bar{f}(\nu)$, and (c) minimax if $\sup_{\omega\epsilon\Omega} \lbrack r (\bar{f} \| \omega)\rbrack = \inf_f \sup_{\omega\epsilon\Omega} \lbrack r(f\| \omega)\rbrack$. The "problem of the greater mean" is, for any given $\Omega$, to determine the minimax decision functions, particularly those which are also admissible. Special interest attaches to the case in which there exists a unique minimax decision function $\bar{f}(\nu)$ (in the sense that if $f(\nu)$ is any minimax decision function then $f(\nu) = \bar{f}(\nu)$ for almost every $\nu$ in the sample space); such an $\bar{f}(\nu)$ is automatically admissible. The problem of the greater mean is, of course, a special problem in Wald's general theory of statistical decision functions [1]. Our results will, however, be derived by very simple direct methods which make no use of Wald's general theorems. We cite without proofs a few examples in order to show how strongly the solution of the problem of the greater mean depends on the structure of $\Omega$. In each case the minimax decision function is a function only of the two sample means $\bar{x}_1, \bar{x}_2$. (i) Let $\Omega'$ consist of the two points $(a, b: \sigma)$ and $(b, a: \sigma),$ with $a < b$. Then \begin{equation*}\tag{(2)} f^\ast(\nu) = \begin{cases}1 \text{if} n_1\bar{x}_1 - n_2\bar{x}_2 > (n_1 - n_2)(a + b)/2,\\0 \text{otherwise}\end{cases}\end{equation*} is the unique minimax decision function. (ii) Let $\Omega"$ consist of the two points $(c + h, c: \sigma)$ and $(c - h, c: \sigma)$, with $h > 0$. Then \begin{equation*}\tag{(3)} f^0_c(\nu) = \begin{cases}1\text{if} \bar{x}_1 > c\\0\text{otherwise},\end{cases}\end{equation*} is the unique minimax decision function. (iii) Let $\Omega"'$ consist of the three points $(\frac{1}{2}, - \frac{1}{2}:1), (\frac{1}{2}, \frac{3}{2}:1), (-\frac{3}{2}, -\frac{1}{2}:1),$ and let $n_1 = n_2 = n$. Then \begin{equation*}\tag{(4)}f^{\ast\ast}(\nu) = \begin{cases}1\text{if} e^{-2n\bar{x}_1} + e^{2n\bar{x}_2} < \lambda,\\0 \text{otherwise}\end{cases}\end{equation*}, where $\lambda$ is a certain definite constant, is the unique minimax decision function. The parameter spaces of two or three points specified in these examples are rather trivial, but in fact the corresponding decision functions (2), (3), (4) remain the unique minimax solutions of the decision problem with respect to much more general parameter spaces. Thus, for example, it is clear that $f^\ast(\nu)$ will remain the unique minimax decision function with respect to any $\Omega$ which contains $\Omega'$ and is such that $\sup_{\omega\epsilon\Omega} \lbrack r(f^\ast \| \omega)\rbrack = \sup_{\omega\epsilon\Omega'} \lbrack r(f^\ast \|\omega)\rbrack$. Corresponding remarks apply to $f^0_c(\nu)$ and $f^{\ast\ast}(\nu)$. When $n_1 = n_2$, (2) reduces to \begin{equation*}\tag{(5)}f^0(\nu) = \begin{cases}1 \text{if} \bar{x}_1 > \bar{x}_2,\\0 \text{otherwise}\end{cases}\end{equation*}. This decision function is of particular interest when both the means $m_1, m_2$ are unknown. It will be shown that whether or not $n_1 = n_2, f^0(\nu)$ is the unique minimax decision function under certain conditions on $\Omega$ which are likely to hold in practice, at least when both $n_1$ and $n_2$ are sufficiently large (Theorem 3). Likewise, $f^0_c(\nu)$, which is the analogue of $f^0(\nu)$ when one of the means $(m_2)$ is known exactly, is apt to be the unique minimax decision function in such cases, at least when $n_1$ is sufficiently large (Theorem 4). These results on $f^0(\nu)$ and $f^0_c(\nu)$ form the main results of the present paper. So much by way of a general summary. We shall now give a practical illustration (another is given in Section 3) to show how the problem of the greater mean arises in applications. Suppose that a consumer requires a certain number of manufactured articles which can be supplied at the same cost by each of two sources $\pi_1$ and $\pi_2$. The quality of an article is measured by a numerical characteristic $x$, and it is known that in the product of $\pi_i, x$ is normally distributed with mean $m_i$ and variance $\sigma^2,$ but the values of these parameters are unknown. The consumer has obtained a random sample of $n_1$, and $n_2$ articles from $\pi_1$ and $\pi_2$ respectively, and has found the values of $x$ to be $(x_{11}, x_{12}, \cdots, x_{1n_1}; x_{21}, x_{22}, \cdots, x_{2n_2}) = \nu$. What is the best way of ordering a total of $N$ articles from the two sources? The usual statistical theory, which confines itself to estimating the unknown parameters and to testing hypotheses of the form $H_0(m_1 = m_2),$ has at best an indirect bearing on the problem at hand. We therefore adopt Wald's point of view and investigate the consequences of any given course of action. If the consumer orders $fN$ articles from $\pi_1$ and $(1 - f)N$ from $\pi_2$, where $0 \leq f \leq 1,$ then the expectation of the sum of the $x$-values in the articles he obtains will be $N(m_1f + m_2(1 - f))$. The maximum possible value of this quantity is $N \max\lbrack m_1, m_2\rbrack$, and the "loss" per article which he sustains may therefore be taken as $W(\omega, f) = \max \lbrack m_1, m_2\rbrack - m_1f - m_2(1 - f) \geq 0$, where $\omega = (m_1, m_2: \sigma)$ is the true parameter point. The consumer wants to choose $f$ so as to make $W$ as small as possible. If he knew $m_1$ to be greater, or to be less, than $m_2$, then by choosing $f = 1$ or 0 respectively he could make $W = 0$. But since he does not know which $m_1$ is the greater he will presumably choose $f$ as some function of the sample point $\nu$. Suppose, therefore, that a "decision function" $f(\nu)$, such that $0 \leq f(\nu) leq 1$ but not necessarily taking on only the values 0 and 1, is defined for all points $\nu$ in the sample space and that the consumer sets $f = f(\nu).$ In repeated applications of this procedure, the "risk" or expected loss (a double expectation is involved: the expected loss for a given $f$ and the expected value of $f$ in using the decision function $f(\nu))$ per article is given by (1), and the consumer will try to find an $f(\nu)$ which minimizes this risk. Since the value of the risk depends on $\omega$ it is necessary to specify which values of $\omega$ are to be regarded as possible in the given problem; let the set of all such $\omega$ be denoted by $\Omega$. If the consumer agrees to adopt the "conservative" criterion of minimizing the maximum possible risk, then the statistician's problem is to find the minimax decision functions in the sense defined above. We have given the solutions of this problem for certain types of parameter spaces. The reader will observe that each of the minimax decision functions (2), (3), (4) was of the "all or nothing" type, with values 0 and 1 only. (Whether this remains true for every $\Omega$ we do not know.) By using one of these decision functions in a given instance one arrives at either the best possible decision or the worst. The attitudes of doubt sometimes associated with the non-rejection of the hypothesis $H_0(m_1 = m_2)$ are therefore irrelevant to the problem of the greater mean in the examples cited. (Cf. footnote 2; also Example 1 in Section 3.) The risk function (1) is but one of a general class $R$ of risk functions, to be defined in Section 2, which are associated with the problem of the greater mean. The most important members of $R$ are (1) and \begin{equation*}\tag{(6)}\bar{r}(f\| \omega) = P\text{(incorrect decision using} f(\nu)\|\omega),\end{equation*} where $"m_1 \leq m_2"$ and $"m_1 \geq m_2"$ are the two possible decisions. The risk function (6) is relevant to applications of a purely "scientific" nature in which the statistician is asked merely to give his opinion as to which population has the greater mean. Although the problem of constructing a suitable decision function for (6) is akin in spirit to the problems considered in the now classical Neyman-Pearson theory of statistical tests, no satisfactory solutions seem to be available. It is easy to see, however, that (1) and (6) are quite similar. Of course, in the case of (1) a decision function $f(\nu)$ may take on any value between 0 and 1 inclusive, while for (6) we allow only functions which take on only the values 0 and 1, corresponding respectively to the decisions $"m_1 \leq m_2"$ and $"m_1 \geq m_2"$. We then have for any such $f(\nu)$, \begin{equation*}\tag{(6')}\bar{r}(f\mid\omega) = \begin{cases}P(f(\nu) = 1 \| \omega) = E\lbrack f\| \omega\rbrack \text{if} m_1 < m_2,\\P(f(\nu) = 0 \| \omega) = E\lbrack 1 - f \|\omega\rbrack \text{if} m_1 > m_2,\\0\quad \text{if} m_1 = m_2,\end{cases}\end{equation*} and by comparison with (1) we see that $r(f\| \omega) = \| m_1 - m_2\| \bar{r}(f\| \omega)$ for all $\omega$. Now, in the three examples (i), (ii), (iii) cited above the unique minimax decision functions happen to take on only the values 0 and 1, and $\|m_1 - m_2\|$ is constant on each of the respective parameter sets. It follows that (2), (3), (4) are also the unique minimax decision functions relative to (6) and to $\Omega', \Omega", \Omega'''$ respectively. The remarks above following Example (iii) also remain valid for the risk function (6). We conclude this section with a remark on the methods of this paper. Any decision function relevant to (6) is equivalent to a test of the hypothesis $H_0(m_1 < m_2)$ against the alternative $H_1(m_1 > m_2)$, the region $\{\nu: f(\nu) = 1\}$ being the "critical region." Hence the Neyman-Pearson probability ratio method can be used to obtain the unique minimax decision function with respect to (6) and an $\Omega$ consisting of two (or more) points, and the result carries over to more general types of $\Omega$ in the manner already indicated. It turns out, however, that the dominant properties of the probability ratio tests are not confined to the class of tests alone, but extend to the class of all functions $f(\nu)$ such that $0 \leq f(\nu) \leq 1$. This result (Theorem 1) enables us to solve the problem of the greater mean for the risk function (1) as well as for (6). The reader who is interested in applications may turn to Section 3.

## Citation

Raghu Raj Bahadur. Herbert Robbins. "The Problem of the Greater Mean." Ann. Math. Statist. 21 (4) 469 - 487, December, 1950. https://doi.org/10.1214/aoms/1177729746

## Information