## Abstract

In recent years considerable research has been devoted to a class of problems which is concerned with ranking and/or selecting a subset of $k$ given populations where the ranking or the selection is defined in terms of a (scalar) parameter of the populations. In these problems the interest is often centered on the populations having large (small) values of the ranking parameter. One usually refers to the $t(< k)$ populations with largest (smallest) values of the ranking parameter as the $t$ best populations. In this paper we consider a problem of selecting a subset of specified size, from a given set of $k$ populations, which contains a subset of the $t$ best populations. Suppose that $\prod_1, \prod_2, \cdots, \prod_k$ is a given set of $k$ populations, where the distribution function of each observation from $\prod_i$ is $F(\cdot \mid \theta_i)$. The parameter $\theta_i$ is unknown, but it belongs to the interval, $\Theta$, of the real line $(1 \leqq i \leqq k)$. We assume that the functional form of $F$ is known. Let $\theta_{\lbrack 1\rbrack} \leqq \theta_{\lbrack 2\rbrack} \leqq \cdots \leqq \theta_{lbrack k\rbrack}$ be the ranked $\theta_i$; we assume that it is not known with which population $\theta_{\lbrack i\rbrack}$ is associated $(1 \leqq i \leqq k)$. The $t$ populations with largest $\theta$-values are defined as the $t$ best populations and we refer to $\theta$ as the ranking parameter. The problem of selecting the $t$ best populations in an unordered manner has been studied extensively from the sampling point of view, in relation to several distributions. The usual formulation of the problem is the following: the experimenter's goal is to select the $t$ best populations in an unordered manner. He specifies two positive constants $d^\ast$ and $P^\ast$, where $\binom{k}{t}^{-1} < P^\ast < 1$. He desires to have a fixed-sample procedure which has a probability of at least $P^\ast$ of selecting the $t$ best populations whenever $\theta_{\lbrack k - t + 1\rbrack}$ is at a distance not less than $d^\ast$ from $\theta_{\lbrack k - t\rbrack}$. Bechofer [3] developed a procedure based on predetermined number of observations from each population when $F$ is the normal distribution function with unknown mean $\theta$ and known variance. Bechhofer and Sobel [4] considered a similar problem in relation to normal populations where the ranking parameter is the variance. Sobel and Huyett [9], Sobel [8], Rizvi [7], Baar and Rizvi [2] considered similar problems for other distributions. Here we consider a generalized version of the above selection problem. We solve the problem in broad generality by not considering a specific family of distributions $F(\cdot \mid \theta)$, but by assuming certain properties of $F$. Thus many of the available results (concerning the above selection problem) can be obtained as special cases of our results. Let $c, s, t$ be integers such that $\max (1, s + t + 1 - k) \leqq c \leqq \min (s, t)$, which implies that $\max (s, t) \leqq k - 1$. The experimenter's goal, which is referred as Goal I, is to select a subset of size $s$ which contains at least $c$ of the $t$ best populations. The experimenter specifies two positive constants $d^\ast$ and $P^\ast(<1)$. He desires to have a fixed-sample procedure for which the probability of selecting the desired type of subset of populations is not less than $P^\ast$ whenever the distance between $\theta_{\lbrack k - t\rbrack}$ and $\theta_{\lbrack k - t + 1\rbrack}$ is at least $d^\ast$. Two particular cases of this goal are of special interest: (1) Selection of a subset of size $s(\geqq t)$ which contains the $t$ best populations, (2) Selection of a subset of size $s(\leqq t)$ which includes any $s$ of the $t$ best populations. Sobel pointed (see the footnote on page 22 of [3]) out that sometimes Case 2 is of interest. These two cases correspond to $c = t, s \geqq t$ and $c = s, s \leqq t$. Further when $c = s = t$, Goal I reduces to the goal of selecting the $t$ best in an unordered manner. The proposed selection procedure, $R_s$, is based on suitable statistics $T_1, T_2, \cdots, T_k$, where $T_i$ is computed from a random sample of size $n$ from $\prod_i(1 \leqq i \leqq k)$. The procedure $R_s$ selects the subset of populations which corresponds to the $s$ largest $T$-values. Having specified the procedure as above, the problem is to determine the common sample size $n$ so that the probability requirement imposed on the procedure is satisfied. This problem has been solved under the assumption that $T_i$ is an absolutely continuous random variable and its distribution function is stochastically increasing in $\theta_i$ for each value of $n$. This has been done in Section 5. It should be noted (by considering the subset not selected) that the above selection problem is logically equivalent to similar selection problem where the experimenter's goal is to select a subset of size $(k - s)$ which contains at least $(k - t) - (s - c)$ of the $(k - t)$ populations with smallest $\theta$-values. Thus solutions to the above problem for all admissible values of $c, s$ and $t$ (with fixed $k$) will provide solutions to the selection problem where the goal is the selection of a subset of size $s$, which contains at least $c$ of those $t$ populations with smallest values for the ranking parameter. Section 6 gives results for the two above mentioned particular cases of Goal I. A theorem, which relates the sample sizes necessary to achieve Goal I, and its particular cases, is given in that section. Section 7 deals with an easily verifiable sufficient condition for the existence of the required sample size. In Section 8, the general results have been applied to normal distributions with unknown mean and known common variance.

## Citation

D. M. Mahamunulu. "Some Fixed-Sample Ranking and Selection Problems." Ann. Math. Statist. 38 (4) 1079 - 1091, August, 1967. https://doi.org/10.1214/aoms/1177698778

## Information