The Annals of Mathematical Statistics

Estimating the Size of a Multinomial Population

Lalitha Sanathanan

Abstract

This paper deals with the problem of estimating the number of trials of a multinomial distribution, from an incomplete observation of the cell totals, under constraints on the cell probabilities. More specifically let $(n_1, \cdots, n_k)$ be distributed according to the multinomial law $M(N; p_1, \cdots, p_k)$ where $N$ is the number of trials and the $p_i$'s are the cell probabilities, $\sum^k_{i=1}p_i$ being equal to 1. Suppose that only a proper subset of $(n_1, \cdots, n_k)$ is observable, that $N, p_1, \cdots, p_k$ are unknown and that $N$ is to be estimated. Without loss of generality, $(n_1, \cdots, n_{l-1}), l \leqq k$ may be taken to be the observable random vector. For fixed $N, (n_1, \cdots, n_{l-1}, N - n)$ has the multinomial distribution $M(N; p_1, \cdots, p_l)$ where $n$ denotes $\sum^{l-1}_{i=1}n_i$ and $p_l$ denotes $1 - \sum^{l-1}_{i=1}p_i$. If the parameter space is such that $N$ can take any nonnegative integral value and each $p_i$ can take any value between 0 and 1, such that $\sum^{l-1}_{i=1}p_i < 1$ then, clearly, the only inference one can make about $N$ is that $N > n$. In specific situations, it might, however, be possible to postulate constraints of the type \begin{equation*}\tag{1.1} p_i = f_i(\theta),\quad i = 1, \cdots, l\end{equation*} where $\theta = (\theta_1, \cdots, \theta_r)$ is a vector of $r$ independent parameters and $f_i$ are known functions. This may lead to estimability of $N$. The problem of estimating $N$ in such a situation is studied here. The present investigation is motivated by the following problem. Experiments in particle physics often involve visual scanning of film containing photographs of particles (occurring, for instance, inside a bubble chamber). The scanning is done with a view to counting the number $N$ of particles of a predetermined type (these particles will be referred to as events). But owing to poor visibility caused by such characteristics as low momentum, the distribution and configuration of nearby track patterns, etc., some events are likely to be missed during the scanning process. The question, then, is: How does one get an estimate of $N$? The usual procedure of estimating $N$ is as follows. Film containing the $N$ (unknown) events is scanned separately by $w$ scanners (ordered in some specific way) using the same instructions. For each event $E$ let a $w$-vector $Z(E)$ be defined, such that the $j$th component $Z_j$ of $Z(E)$ is 1 if $E$ is detected by the $j$th scanner and is 0 otherwise. Let $\mathscr{J}$ be the set of $2^w w$-vectors of 1's and 0's and let $I_0$ by the vector of 0's. Let $x_I$ be the number of events $E$ whose $Z(E) = I$. For $I \in \mathscr{J} - \{I_0\}$, the $x_I$'s are observed. A probability model is assumed for the results of the scanning process. That is, it is assumed that there is a probability $p_I$ that $Z(E)$ assumes the value $I$ and that these $p_I$'s are constrained by equations of the type (1.1) (These constraints vary according to the assumptions made about the scanners and events, thus giving rise to different models. An example of $p_I(\theta)$ would be $E(\nu^{\Sigma^w_{j=1}I_j}(1 - \nu)^{w-\Sigma^w_{j=1}I_j})$ where $I_j$ is the $j$th component of $I$ and expectation is taken with respect to the two-parameter beta density for $v$. This is the result of assuming that all scanners are equally efficient in detecting events, that the probability $v$ that an event is seen by any scanner is a random variable and that the results of the different scans are locally independent. For a discussion of various models, see Sanathanan (1969), Chapter III. $N$ is then estimated using the observed $x_I$'s and the constraints on the $P_I$'s, provided certain conditions (e.g., the minimum number of scans required) are met. The following formulation of the problem of estimating $N$, however, leads to some systematic study including a development of the relevant asymptotic distribution theory for the estimators. The $Z(E)$'s may be regarded as realizations of $N$ independent identically distributed random variables whose common distribution is discrete with probabilities $p_I$ at $I$ (In particle counting problems, it is usually true that the particles of interest are sparsely distributed throughout the film on account of their Poisson distribution with low intensity. Thus in spite of the factors affecting their visibility outlined earlier, the events can be assumed to be independent.). The joint distribution of the $x_I$'s is, then, multinomial $M(N; p_I, I \in \mathscr{J})$. The problem of estimating $N$ is now in the form stated at the beginning of this section. Since the estimate depends on the constraints provided for the $p_I$'s, it is important to test the "fit" on the model selected. The conditional distribution of the $x_I$'s $(I \neq I_0)$ given $x$ is multinomial $M(x; p_I/p(I \neq I_0))$ where $x$ is defined as $\sum_{I\neq I_0} x_I$ and $p$ as $\sum_{I\neq I_0}P_I$. The corresponding $\chi^2$ goodness of fit test may therefore be used to test the adequacy of a model in question. Various estimators of $N$ are considered in this paper and among them is, of course, the maximum likelihood estimator of $N$. Asymptotic theory for maximum likelihood estimation of the parameters of a multinomial distribution has been developed before for the case where $N$ is known but not for the case where $N$ is unknown. Asymptotic theory related to the latter case is developed is Section 4. The result on the asymptotic joint distribution of the relevant maximum likelihood estimators is stated in Theorem 2. A second method of estimation considered is that of maximizing the likelihood based on the conditional probability of observing $(n_1,\cdots, n_{l-1})$, given $n$. This method is called the conditional maximum likelihood (C.M.L.) method. The C.M.L. estimator of $N$ is shown (Theorem 2) to be asymptotically equivalent to the maximum likelihood estimator. Section 5 contains an extension of these results to the situation involving several multinomial distributions. This situation arises in the particle scanning context when the detected events are classified into groups based on some factor like momentum which is related to visibility of an event, and a separate scanning record is available for each group. A third method of estimation considered is that of equating certain linear combinations of the cell totals (presumably chosen on the basis of some criterion) to their respective expected values. Asymptotic theory for this method is given in Section 6. This discussion is motivated by a particular case which is applicable to some models in the particle scanning problem, using a criterion based on the method of moments for the unobservable random variable, given by the number of scanners detecting an event (Discussion of the particular case can be found in Sanathanan (1969) Chapter III.). In the next section we give some definitions and a preliminary lemma.

Article information

Source
Ann. Math. Statist., Volume 43, Number 1 (1972), 142-152.

Dates
First available in Project Euclid: 27 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aoms/1177692709

Digital Object Identifier
doi:10.1214/aoms/1177692709

Mathematical Reviews number (MathSciNet)
MR298815

Zentralblatt MATH identifier
0241.62007

JSTOR