## The Annals of Mathematical Statistics

### On Two-Stage Non-Parametric Estimation

Elizabeth H. Yen

#### Abstract

In this paper, a two-sample, two-stage nonparametric estimation problem will be studied. The parameter $\theta = \theta(F, G)$ under consideration is estimable (i.e., there exists an unbiased estimator $\phi = \phi(X_1, \cdots, X_r; Y_1, \cdots, Y_s)$ of $\theta$). $\phi$ is a function of independent observations from two populations with cumulative distribution functions $F(X)$ and $G(Y)$. The functions $F(X)$ and $G(Y)$ belong to a specified class $D$, such that a $U$-statistic based on $\phi$ is the unique minimum variance unbiased estimator of $\theta$. The total number of observations on populations $X$ and $Y$ will be a fixed number $N$. The sampling procedure is carried out in two stages. First, take $M$ observations from each of the populations; then allocate the remaining $N - 2M$ observations between the populations. The method of allocation utilizes the information from the first stage observations. Two kinds of two-stage estimators, represented by $U'$, and $U'$ will be introduced in this paper. Both $U'$ and $U"$ are $U$-statistics with random sample sizes. $U'$ is based essentially on the second stage observations only. $U''$ is defined on all $N$ observations. Intuitively, the statistic $U''$ is more appealing. The first stage observations are used not only to determine the allocation of the second stage observations, but also to estimate the parameter $\theta$. (see Section 3) One of the main results (Section 4) is that $U'$ is unbiased and under certain conditions, the variance of $U'$ approaches asymptotically to a particular variance $V_0$. (Here we shall consider the cases that both the variances of $U'$ and $U''$, are finite.) $U''$ is in general biased. However, under the same conditions the value $E(U'' - \theta)^2$ approaches asymptotically to the same value $V_0$. This value $V_0$ is the smallest variance of any one-stage $U$-statistic estimator of $\theta$, subject to the restriction that the total number of observations on $X$ and on $Y$ is $N. V_0$ is computed (see Section 2) when the best one-stage allocation of $N$ observations to the two populations is made with the help of partial or even complete information about the distributions $F(X)$ and $G(Y)$. Such information about $F$ and $G$ is represented by the "nuisance parameters" $b_{10} = b_{10}(F, G), b_{01} = b_{01}(F, G)$, etc., defined in Section 2. No prior knowledge of $b_{10}$ and $b_{01}$ is required to compute $\operatorname{Var}(U')$, and $E(U'' - \theta)^2$. In Section 5, the "optimal" choice of the first stage sample size $M$ relative to the fixed total sample size $N$ is discussed. The term "optimal" is in the sense that the particular choices of $M$ in relative order of magnitude of $N$, such that as $N$ goes to infinity, the ratios $\operatorname{Var}(U')/V_0$ and $E(U'' - \theta)^2/V_0$ approach unity as fast as possible in order of magnitude of $N$. Three cases with different conditions on $\phi$ are considered. It is found that the "optimal" choices depend on the specific conditions. Section 6 contains some examples. To each $\theta(F, G)$, the corresponding estimators for $b_{10}$ and $b_{01}$ together with their behavior under different conditions on $F$ and $G$, will be given. The examples include cases where the proposed procedures can be applied as well as cases where it cannot be applied. Section 7 shows the asymptotic normality of $U'$ and $U''$. Section 8 indicates that the proposed procedures can be extended to $k$-sample case, for $k > 2$, with similar results. The technique of two-stage estimation has been used in several papers. Stein [12] used it to determine confidence interval of a pre-assigned length for the mean of a normal population with unknown variance. Putter [8] used it to estimate the mean of a stratified normal population, Robbins [10] discussed such a technic for the design of experiments. Later, Ghurye and Robbins [4] used it to estimate the difference between the means of two normal populations (or some other specified populations). Richter [9] discussed the estimation of the common mean of two normal populations. During the preparation of the present paper, Alam [1] discussed the estimation of the common mean of $k \geqq 2$ normal populations. This paper generalizes these two-stage procedures in two ways. First, the underlying cumulative distributions $F, G$ are members of a larger class of distributions. Secondly, the underlying parameters $\theta(F, G)$ are not restricted to population means or functions of means. Consequently, in such a general setup the question of "the best" estimator of any particular parameter $\theta(F, G)$ is not considered in this paper.

#### Article information

Source
Ann. Math. Statist., Volume 35, Number 3 (1964), 1099-1114.

Dates
First available in Project Euclid: 27 April 2007

https://projecteuclid.org/euclid.aoms/1177703268

Digital Object Identifier
doi:10.1214/aoms/1177703268

Mathematical Reviews number (MathSciNet)
MR165634

Zentralblatt MATH identifier
0128.13203

JSTOR