The Annals of Statistics

Estimating the number of classes

Chang Xuan Mao and Bruce G. Lindsay

Full-text: Open access


Estimating the unknown number of classes in a population has numerous important applications. In a Poisson mixture model, the problem is reduced to estimating the odds that a class is undetected in a sample. The discontinuity of the odds prevents the existence of locally unbiased and informative estimators and restricts confidence intervals to be one-sided. Confidence intervals for the number of classes are also necessarily one-sided. A sequence of lower bounds to the odds is developed and used to define pseudo maximum likelihood estimators for the number of classes.

Article information

Ann. Statist., Volume 35, Number 2 (2007), 917-930.

First available in Project Euclid: 5 July 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62G15: Tolerance and confidence regions 62G15: Tolerance and confidence regions
Secondary: 62G05: Estimation

Hankel matrix moment problem one-sided inference


Mao, Chang Xuan; Lindsay, Bruce G. Estimating the number of classes. Ann. Statist. 35 (2007), no. 2, 917--930. doi:10.1214/009053606000001280.

Export citation


  • Bahadur, R. R. and Savage, L. J. (1956). The nonexistence of certain statistical procedures in nonparametric problems. Ann. Math. Statist. 27 1115--1122.
  • Blumenthal, S., Dahiya, R. C. and Gross, A. J. (1978). Estimating the complete sample size from an incomplete Poisson sample. J. Amer. Statist. Assoc. 73 182--187.
  • Bunge, J. and Fitzpatrick, M. (1993). Estimating the number of species: A review. J. Amer. Statist. Assoc. 88 364--373.
  • Chao, A. (1984). Nonparametric estimation of the number of classes in a population. Scand. J. Statist. 11 265--270.
  • Chao, A. (2001). An overview of closed capture--recapture models. J. Agric. Biol. Environ. Stat. 6 158--175.
  • Chao, A. and Bunge, J. (2002). Estimating the number of species in a stochastic abundance model. Biometrics 58 531--539.
  • Chao, A. and Lee, S.-M. (1992). Estimating the number of classes via sample coverage. J. Amer. Statist. Assoc. 87 210--217.
  • Curto, R. E. and Fialkow, L. A. (1991). Recursiveness, positivity, and truncated moment problems. Houston J. Math. 17 603--635.
  • Darroch, J. N. and Ratcliff, D. (1980). A note on capture--recapture estimation. Biometrics 36 149--153.
  • Donoho, D. L. (1988). One-sided inference about functionals of a density. Ann. Statist. 16 1390--1420.
  • Gong, G. and Samaniego, F. J. (1981). Pseudomaximum likelihood estimation: Theory and applications. Ann. Statist. 9 861--869.
  • Lindsay, B. G. (1983). The geometry of mixture likelihoods: A general theory. Ann. Statist. 11 86--94.
  • Liu, R. C. and Brown, L. D. (1993). Nonexistence of informative unbiased estimators in singular problems. Ann. Statist. 21 1--13.
  • Mao, C. X. (2004). Prediction the conditional probability of discovering a new class. J. Amer. Statist. Assoc. 99 1108--1118.
  • Mao, C. X. and Lindsay, B. G. (2003). Tests and diagnostics for heterogeneity in the species problem. Comput. Statist. Data Anal. 41 389--398.
  • Mao, C. X. and Lindsay, B. G. (2004). Estimating the number of classes in multiple populations: A geometric analysis. Canad. J. Statist. 32 303--314.
  • Pfanzagl, J. (1998). The nonexistence of confidence sets for discontinuous functionals. J. Statist. Plann. Inference 75 9--20.
  • Romanowska, M. (1977). A note on the upper bound for the distance in total variation between the binomial and the Poisson distribution. Statist. Neerlandica 31 127--130.