The Annals of Applied Statistics

Population size estimation based upon ratios of recapture probabilities

Irene Rocchetti, John Bunge, and Dankmar Böhning

Full-text: Open access

Abstract

Estimating the size of an elusive target population is of prominent interest in many areas in the life and social sciences. Our aim is to provide an efficient and workable method to estimate the unknown population size, given the frequency distribution of counts of repeated identifications of units of the population of interest. This counting variable is necessarily zero-truncated, since units that have never been identified are not in the sample. We consider several applications: clinical medicine, where interest is in estimating patients with adenomatous polyps which have been overlooked by the diagnostic procedure; drug user studies, where interest is in estimating the number of hidden drug users which are not identified; veterinary surveillance of scrapie in the UK, where interest is in estimating the hidden amount of scrapie; and entomology and microbial ecology, where interest is in estimating the number of unobserved species of organisms. In all these examples, simple models such as the homogenous Poisson are not appropriate since they do not account for present and latent heterogeneity. The Poisson–Gamma (negative binomial) model provides a flexible alternative and often leads to well-fitting models. It has a long history and was recently used in the development of the Chao–Bunge estimator. Here we use a different property of the Poisson–Gamma model: if we consider ratios of neighboring Poisson–Gamma probabilities, then these are linearly related to the counts of repeated identifications. Also, ratios have the useful property that they are identical for truncated and untruncated distributions. In this paper we propose a weighted logarithmic regression model to estimate the zero frequency counts, assuming a Gamma–Poisson distribution for the counts. A detailed explanation about the chosen weights and a goodness of fit index are presented, along with extensions to other distributions. To evaluate the proposed estimator, we applied it to the benchmark examples mentioned above, and we compared the results with those obtained through the Chao–Bunge and other estimators. The major benefits of the proposed estimator are that it is defined under mild conditions, whereas the Chao–Bunge estimator fails to be well defined in several of the examples presented; in cases where the Chao–Bunge estimator is defined, its behavior is comparable to the proposed estimator in terms of Bias and MSE as a simulation study shows. Furthermore, the proposed estimator is relatively insensitive to inclusion or exclusion of large outlying frequencies, while sensitivity to outliers is characteristic of most other methods. The implications and limitations of such methods are discussed.

Article information

Source
Ann. Appl. Stat., Volume 5, Number 2B (2011), 1512-1533.

Dates
First available in Project Euclid: 13 July 2011

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1310562731

Digital Object Identifier
doi:10.1214/10-AOAS436

Mathematical Reviews number (MathSciNet)
MR2849784

Zentralblatt MATH identifier
1223.62165

Keywords
Chao–Bunge estimator Katz distribution species problem negative binomial distribution weighted linear regression zero-truncation

Citation

Rocchetti, Irene; Bunge, John; Böhning, Dankmar. Population size estimation based upon ratios of recapture probabilities. Ann. Appl. Stat. 5 (2011), no. 2B, 1512--1533. doi:10.1214/10-AOAS436. https://projecteuclid.org/euclid.aoas/1310562731


Export citation

References

  • Agresti, A. (2002). Categorical Data Analysis. Wiley, New York.
  • Alberts, D. S., Martinez, M. E., Roe, D. J., Guillen-Rodriguez, J. M., Marshall, J. R., Van Leeuwen, B., Reid, M. E., Reitenbaugh, C., Vargas, P. A., Bhattacharyya, E. D. L., Sampliner, R., The Phoenix Colon Cancer Prevention Physician’s Network (2000). Lack of effect of a high-fiber cereal supplement on the recurrence of colorectal adenomas. New England Journal of Medicine 342 1156–1162.
  • Björck, A. (1996). Numerical Methods for Least Squares Problems. SIAM, Philadelphia.
  • Böhning, D. (2008). A simple variance formula for population size estimators by conditioning. Statist. Methodol. 5 410–423.
  • Böhning, D., Dietz, E., Kuhnert, R. and Schön, D. (2005). Mixture models for capture–recapture count data. Statist. Methods Appl. 14 29–43.
  • Böhning, D. and van der Heijden, P. G. M. (2009). A covariate adjustment for zero-truncated approaches to estimating the size of hidden and elusive populations. Ann. Appl. Statist. 3 595–610.
  • Böhning, D. and Schön, D. (2005). Nonparametric maximum likelihood estimation of the population size based upon the counting distribution. J. Roy. Statist. Soc. Ser. C 54 721–737.
  • Böhning, D., Suppawattanabodee, B., Kusolvisitkul, W. and Viwatwongkasem, C. (2004). Estimating the number of drug users in Bangkok 2001: A capture–recapture approach using repeated entries in one list. European Journal of Epidemiology 19 1075–1083.
  • Böhning, D. and Del Rio Vilas, V. (2008). Estimating the hidden number of scrapie affected holdings in Great Britain using a simple, truncated count model allowing for heterogeneity. Journal of Agricultural, Biological, and Environmental Statistics 13 1–22.
  • Bishop, Y. M. M., Fienberg, S. E., Holland, P. W., with the collaboration of Light, Richard, J. and Mosteller, F. (1995). Discrete Multivariate Analysis. MIT, Cambridge, MA.
  • Bunge, J. and Barger, K. (2008). Parametric models for estimating the number of classes. Biom. J. 50 971–982.
  • Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: A review. J. Amer. Statist. Assoc. 88 364–373.
  • Chao, A. (1987). Estimating the population size for capture–recapture data with unequal catchability. Biometrics 43 783–791.
  • Chao, A. (1989). Estimating population size for sparse data in capture–recapture experiments. Biometrics 45 427–438.
  • Chao, A. and Bunge, J. (2002). Estimating the number of species in a stochastic abundance model. Biometrics 58 531–539.
  • Chao, A., Tsay, P. K., Lin, S. H., Shau, W. Y. and Chao, D. Y. (2001). Tutorial in biostatistics: The applications of capture–recapture models to epidemiological data. Stat. Med. 20 3123–3157.
  • Fisher, R. A., Corbet, A. S. and Williams, C. B. (1943). The relation between the number of species and the number of individuals in a random sample of an animal population. The Journal of Animal Ecology 12 44–58.
  • Hay, G. and Smit, F. (2003). Estimating the number of drug injectors from needle exchange data. Addiction Research and Theory 11 235–243.
  • Van Hest, N. A. H., De Vries, G., Smit, F., Grant, A. D. and Richardus, J. H. (2008). Estimating the coverage of Tuberculosis screening among drug users and homeless persons with truncated models. Epidemiology and Infection 136 628–635.
  • Van der Heijden, P. G. M., Cruyff, M. and van Houwelingen, H. C. (2003). Estimating the size of a criminal population from police records using the truncated Poisson regression model. Statist. Neerlandica 57 1–16.
  • Hsu C.-H. (2007). A weighted zero-inflated Poisson model for estimation of recurrence of adenomas. Statistical Method in Medical Research 16 155–166.
  • Johnson, N. L., Kemp, A. W. and Kotz, S. (2005). Univariate Discrete Distributions. Wiley, Hoboken, NJ.
  • Lindsay, B. G. and Roeder, K. (1987). A unified treatment of integer parameter models. J. Amer. Statist. Assoc. 82 758–764.
  • Meaurant, G. (1992). A review on the inverse of symmetric tridiagonal and block matrices. SIAM J. Matrix Anal. Appl. 13 707–728.
  • Pledger, S. A. (2000). Unified maximum likelihood estimates for closed capture–recapture models using mixtures. Biometrics 56 434–442.
  • Pledger, S. A. (2005). The performance of mixture models in heterogeneous closed population capture–recapture. Biometrics 61 868–876.
  • Quince, C., Curtis, T. P. and Sloan, W. T. (2008). The rational exploration of microbial diversity. ISME J. 2 997–1006.
  • Roberts, J. M. and Brewer, D. D. (2006). Estimating the prevalence of male clients of prostitute women in Vancouver with a simple capture–recapture method. J. Roy. Statist. Soc. Ser. A 169 745–756.
  • Stock, A., Jürgens, K., Bunge, J. and Stoeck, T. (2009). Protistan diversity in the suboxic and anoxic waters of the Gotland Deep (Baltic Sea) as revealed by 18S rRNA clone libraries. Aquatic Microbial Ecology 55 267–284.
  • Wilson, R. M. and Collins, M. F. (1992). Capture–recapture estimation with samples of size one using frequency data. Biometrika 79 543–553.
  • Wohlin, C., Runeson, P. and Brantestam, J. (1995). An experimental evaluation of capture–recapture in software inspections. Journal of Software Testing, Verification and Reliability 5 213–232.
  • Zelterman, D. (1988). Robust estimation in truncated discrete distributions with applications to capture–recapture experiments. J. Statist. Plann. Inference 18 225–237.