Nonparametric Ranking Procedures for Comparison with a Control

M. Haseeb Rizvi; Milton Sobel; George G. Woodworth

doi:10.1214/aoms/1177698035

December, 1968 Nonparametric Ranking Procedures for Comparison with a Control

M. Haseeb Rizvi, Milton Sobel, George G. Woodworth

Ann. Math. Statist. 39(6): 2075-2093 (December, 1968). DOI: 10.1214/aoms/1177698035

Abstract

A decision maker is confronted with $k$ populations, $\pi_1, \cdots, \pi_k$, (say, $k$ lots of items available for purchase) and a control population $\pi_0$ and must, on the basis of random samples of common size $n$ from $\pi_0,\cdots, \pi_k$, select those which are at least as good as $\pi_0$. We suppose that items are judged on the basis of a continuously distributed attribute $X$ and that a known fraction $\alpha (0 < \alpha < 1)$ of the items in the control population are deficient (their $X$-values are too small). A population is considered to be better than the control if it has a smaller proportion of deficient items; that is, letting $F_j, j = 0,\cdots, k$, denote the distribution function (df) of $X$ for population $\pi_j$ and $x_\alpha(F_j)$ its $\alpha$th quantile, $\pi_j$ is better than $\pi_0$ if $x_\alpha(F_j) \geqq x_\alpha(F_0)$. We also consider the possibility that $F_0$ is known in which case $\pi_0$ is called a standard and is not sampled. In Section 2 we propose a nonparametric procedure $R$ based on order statistics which guarantees a minimal preassigned probability $P^\ast$ that, when each $F_j$ is stochastically ordered with respect to $F_0$, all populations better than the control will be selected; such a selection will be called a correct selection (CS). The corresponding problem of selecting a subset containing the best population (without any control) was treated in [11]. Since the trivial procedure $R_0$ of including all $k$ populations in the selected subset also guarantees the probability requirement it is necessary to investigate the expected number of misclassifications; this is done exactly in Section 3 and asymptotically in Section 5. Exact results for known standard $F_0$ are given in Section 4. Some other aspects of the problem are briefly discussed in Section 8. As a secondary problem we suppose that for some preassigned fraction $\delta^\ast$ the decision maker considers a population $\pi_i$ to be $\delta^\ast$-inferior to $\pi_0$ if more than $100(\alpha + \delta^\ast)$ percent of the items in $\pi_i$ are as bad as at least one of the worst $100(\alpha - \delta^\ast)$ percent of the items in $\pi_0$; i.e., $\pi_i$ is $\delta^\ast$-inferior if $x_{\alpha-\delta}^\ast(F_0) \geqq X_{\alpha+\delta\ast}(F_i)$. In Section 5 we give asymptotic expressions for the smallest sample size needed to guarantee that the expected proportion of $\delta^\ast$-inferior populations selected by $R$ will be less than a preassigned number $\beta^\ast$. An equally reasonable definition of $\pi_i$ to be $\delta^\ast$-inferior is that more than $100(\alpha + 2\delta^\ast)$ percent of the items in $\pi_i$ are deficient. Our results with $\alpha$ replaced by $\alpha' = \alpha + \delta^\ast$ also apply to this problem. We show in Section 6 that for small values of $\delta^\ast$ a competing non-parametric procedure $S$ based on rank sums and a competing asymptotically non-parametric procedure $M$ based on sample means both require sample sizes proportional to the square of that required by $R$ to achieve the same degree of rejection of $\delta^\ast$-inferiors. For moderate $\delta^\ast$-values it is shown that $S$ requires a sample size which has the same order of magnitude as that required by $R$. In Section 7 we study a related minimax procedure. We append tables for $\alpha = \frac{1}{2}$ of (1) the integer constant $c$ needed to make procedure $R$ explicit, (2) some required values to make the minimax procedure explicit and (3) efficiency comparisons of $S$ with respect to $R$. A Basic Inequality. Let $\mathbf{X} = \{X_{ji}, 1 \leqq j \leqq n, 0 \leqq i \leqq k\}$ denote the combined sample, thus for each $i, X_{1i}, \cdots, X_{ni}$ are independent random variables having the $df F_i(x)$. We regard $\omega = (F_0, F_1,\cdots, F_k)$ as the unknown "parameter" and, for an arbitrary function $\psi$, use the symbol $E_\omega\psi(\mathbf{X})$ to denote the expected value of $\psi(\mathbf{X})$ computed under the assumption that $\omega$ is the true parameter value. The following lemma is used extensively in this paper; we state it without proof since it follows easily from Lemma 2.1 of [1]. LEMMA 1.1 Let $\psi(\mathbf{x})$ be non-increasing in each $x_{j0}, j = 1,\cdots, n$, and non-decreasing in each $x_{ji}, 1 \leqq j \leqq n, 1 \leqq i \leqq k$, and let $\omega = (F_0, F_1,\cdots, F_k)$ and $\omega' = (F'_0, F'_1,\cdots, F_k)$ satisfy $F_0(x) \leqq F'_0(x)$ and $F_i(x) \geqq F'_i(x)$ for $i = 1,\cdots, k$ and all $x$, then $E_\omega\psi(\mathbf{X}) \leqq E_{\omega'}\psi(\mathbf{X})$.

Citation

Download Citation

M. Haseeb Rizvi. Milton Sobel. George G. Woodworth. "Nonparametric Ranking Procedures for Comparison with a Control." Ann. Math. Statist. 39 (6) 2075 - 2093, December, 1968. https://doi.org/10.1214/aoms/1177698035