## The Annals of Mathematical Statistics

- Ann. Math. Statist.
- Volume 39, Number 3 (1968), 1057-1068.

### Sequential Maximum Likelihood Estimation of the Size of a Population

#### Abstract

Consider the following model, the practical applications of which will be discussed elsewhere. An urn contains an unknown number, $N$, of white balls, and no others. An estimate of $N$ is desired, based on the following sampling procedure. Balls are drawn at random, one at a time, from the urn. A white ball is colored black before it is returned, a black ball is returned unchanged. The ball is always returned before the next ball is drawn. We are interested in two problems: (i) what stopping rule $t$ to use to terminate sampling, and (ii) how to estimate $N$ after we stop. The present problem (also in a more general setup) has been considered by several authors, notably L. A. Goodman [5], Chapman [1], Darroch [3] and Darling and Robbins [2]. We shall refer to their results in the sequel. Let $w_i, b_i$ denote the (random) number of white balls, black balls, respectively, observed in the first $i$ draws $(w_i + b_i = i)$. We shall consider mainly the following stopping rules. RULE A. Let $A > 0$ be a fixed integer. $t_A = A$. RULE B. Let $B > 0$ be a fixed integer. $t_B = \inf \{i\mid b_i = B\}$. RULE C. Let $C > 0$ be fixed. $t_c = \inf \{i|b_i \geqq Cw_i\} = \inf \{i|i \geqq (C + 1)w_i\}$. RULE D. Let $- \infty < D < \infty$ be fixed. \begin{align*}t_D = \inf \{i\mid b_i \geqq \max (1, w_i \log w_i + w_iD)\} \\ = \inf \{i\mid i \geqq \max (w_i + 1, w_i \log w_i + w_i(D + 1))\}.\end{align*} RULE E. Let $\{D_j\}$ be such that $\lim D_j = \infty$. $t_E = \inf \{i\mid b_i \geqq \max (1, w_i \log w_i + w_iD_{w_i})\}.$ Since $w_i \leqq N$ each of these rules is bounded, and thus clearly stops with probability one. Of these rules, Rule B has been investigated most. See [5], [1] and [3]. Rule D has been considered in a recent paper [2] by Darling and Robbins, who show that for any $0 < \alpha < 1$ and a suitable choice of $D$ one can have $P_N(W_D = N) \geqq 1 - \alpha$ uniformly in $N$, where $W_D$ is the total of white balls observed before stopping. (See Section 6). The motivation for consideration of rules A to E stems from a theorem on the limiting distribution, as $N \rightarrow \infty$, of $w_i$, in a sample of fixed size $i$, for various relationships between $i$ and $N$. We restate the theorem here, since we shall need part of it in the sequel. Different parts of the theorem have been proved by various authors. See for example Renyi [8], where also proper references are given. Let $u_i = N - w_i =$ number of unobserved white balls in the sample of size $i (=$ number of white balls in the urn, after $i$ draws). Since there are linear relationships between $w_i, b_i, u_i$ we shall express the limiting distribution for one of these variables only. Let $\Phi$ denote the distribution function of a standard normal variable. We have, THEOREM 1. Let $N \rightarrow \infty$. Case $A$. If $i = (2N\lambda_N)^{\frac{1}{2}}$ where $\lambda_N \rightarrow 0$ then $P_N(b_i = 0) \rightarrow 1$. Case $B$. If $i = (2N\lambda_N)^{\frac{1}{2}}$ where $\lambda_N \rightarrow \lambda$ and $0 < \lambda < \infty$ then $P_N(b_i = k) \rightarrow e^{-\lambda}\lambda^k/k!\quad k = 0, 1, \cdots.$ Case $C$. If $i = a_NN$ where $\xi_NN^{-\frac{1}{2}} < a_N < \log N - \mu_N$ and $\xi_N \rightarrow \infty, \mu_N \rightarrow \infty$ then $P_N((w_i - Ew_i)/(\operatorname{Var} w_i)^{\frac{1}{2}} \leqq x) \rightarrow \Phi(x), - \infty < x < \infty.$ Case $D$. If $i = N \log N + Na_N$ where $a_N \rightarrow a$ and $- \infty < a < \infty$ then $P_N(u_i = k) \rightarrow e^{-\lambda}\lambda^k/k!, \quad k = 0, 1, \cdots \text{where} \lambda = e^{-a}.$ Case $E$. If $i = N \log N + Na_N$ where $a_N \rightarrow \infty$, then $P_N(w_i = N) \rightarrow 1$. We consider mainly the Maximum Likelihood Estimate (MLE) of $N$, denoted by $\hat{N}$. It turns out that if when we stop we have seen $w$ white and $b$ black balls, then $\hat{N} = \hat{N} (w, b)$ and does not depend on the stopping rule used, though the distribution of $\hat{N}$ clearly will depend on the stopping rule. The value of $\hat{N}(w, b)$ is discussed in Section 2. In Section 3 we briefly consider Rule A. It satisfies $P_N(\hat{N} = \infty) > 0$ for all $N \geqq A$. Rule B is discussed in Section 4 and it is shown that $2B\hat{N}/N$ has an asymptotic chi square distribution with $2B$ degrees of freedom, (to be denoted $\chi^2_{2B})$, as $N \rightarrow \infty$. In Section 5 Rule C is considered, and bounds on the distribution of $(\hat{N} - N)N^{-\frac{1}{2}}$ in terms of the normal distribution are given. Rules D and E are considered in Section 6. Let $\lbrack x\rbrack^\ast$ be the largest integer not exceeding $x$. For Rule D it is shown that $\hat{N} - N + \lbrack\lambda\rbrack^\ast$ has an asymptotic Poisson distribution with parameter $\lambda = \exp (-D - 1)$ and for Rule E $P_N(\hat{N} = N) \rightarrow 1$. The exact and asymptotic distributions of the corresponding $t$'s is also considered, and is closely related to the distribution of $\hat{N}$.

#### Article information

**Source**

Ann. Math. Statist., Volume 39, Number 3 (1968), 1057-1068.

**Dates**

First available in Project Euclid: 27 April 2007

**Permanent link to this document**

https://projecteuclid.org/euclid.aoms/1177698338

**Digital Object Identifier**

doi:10.1214/aoms/1177698338

**Mathematical Reviews number (MathSciNet)**

MR225435

**Zentralblatt MATH identifier**

0193.16701

**JSTOR**

links.jstor.org

#### Citation

Samuel, Ester. Sequential Maximum Likelihood Estimation of the Size of a Population. Ann. Math. Statist. 39 (1968), no. 3, 1057--1068. doi:10.1214/aoms/1177698338. https://projecteuclid.org/euclid.aoms/1177698338