Abstract
Consider taking a random sample of size $n$ from a finite population that consists of $N$ categories with $M_{i}$ copies in the $i$th category for $i=1,\dots,N$. Each observed unit in a sample is presumed to have a probability $1-p$ ($0<p<1$) of getting lost from the sample. Let $S$ denote the number of categories not observed in the sample and $S_{j}$ denote the number of categories where $j$ samples are observed for $j=1,\dots,n$. In this paper, the probability distribution and factorial moments of $S$ and $S_{j}$ are studied. A matrix inversion algorithm is used in order to facilitate numerical computations in obtaining the probabilities and factorial moments. A couple of examples of the problem considered in this paper may include a filing or storage process, where objects are randomly assigned to files or storage bins, and from time to time, objects may be missing or have disappeared, species as categories in a capture-recapture problem, or DNA sequence study.
Citation
Sungsu Kim. Chong Jin Park. "On the number of unobserved and observed categories when sampling from a multivariate hypergeometric population." Braz. J. Probab. Stat. 32 (2) 309 - 319, May 2018. https://doi.org/10.1214/16-BJPS344
Information