The Annals of Statistics

Consistent estimation of mixture complexity

Lancelot F. James, David J. Marchette, and Carey E. Priebe

Full-text: Open access

Abstract

The consistent estimation of mixture complexity is of fundamental importance in many applications of finite mixture models. An enormous body of literature exists regarding the application, computational issues and theoretical aspects of mixture models when the number of components is known, but estimating the unknown number of components remains an area of intense research effort. This article presents a semiparametric methodology yielding almost sure convergence of the estimated number of components to the true but unknown number of components. The scope of application is vast, as mixture models are routinely employed across the entire diverse application range of statistics,including nearly all of the social and experimental sciences.

Article information

Source
Ann. Statist. Volume 29, Number 5 (2001), 1281-1296.

Dates
First available in Project Euclid: 8 February 2002

Permanent link to this document
http://projecteuclid.org/euclid.aos/1013203454

Digital Object Identifier
doi:10.1214/aos/1013203454

Mathematical Reviews number (MathSciNet)
MR1873331

Zentralblatt MATH identifier
01829055

Subjects
Primary: 62G05: Estimation
Secondary: 62G07: Density estimation

Keywords
Finite mixture model number of components semiparametric

Citation

James, Lancelot F.; Priebe, Carey E.; Marchette, David J. Consistent estimation of mixture complexity. Ann. Statist. 29 (2001), no. 5, 1281--1296. doi:10.1214/aos/1013203454. http://projecteuclid.org/euclid.aos/1013203454.


Export citation

References

  • Barron, A. R. and Cover, T. M. (1991). Minimum Hellinger distance estimates for parametric models. IEEE Trans. Inform Theory 37 1034-1054.
  • Beran, R. (1977). Minimum Hellinger distance estimates for parametric models. Ann. Statist. 5 445-463.
  • Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press.
  • Cao, R., Cuevas, A. and Fraiman, R. (1995). Minimum distance density-based estimation. Comput. Statist. Data Anal. 20 611-631.
  • Cao, R. and Devroye, L. (1996). The consistency of a smoothed minimum distance estimate. Scand. J. Statist. 23 405-418.
  • Chen, J. and Kalbfleisch. J. D. (1996). Penalized minimum distance estimates in finite mixture models. Canad. J. Statist. 24 167-175.
  • Clarke, B. R. and Heathcote, C. R. (1994). Robust estimation of k-component univariate normal mixtures. Ann. Inst. Statist. Math. 46 83-93.
  • Cordero-Bra na, O. I and Cutler, A. (2001). On the asymptotic properties of the minimum Hellinger estimation in the case of a mixture model. Research Report 7/01/104, Dept. Mathematics and Statistics, Utah State Univ.
  • Cutler, A. and Cordero-Bra na, O. I. (1996). Minimum Hellinger distance estimation for finite mixture models. J. Amer. Statist. Assoc. 91 1716-1723.
  • Dacunha-Castelle, D. and Gassiat, E. (1997). The estimation of the order of a mixture model. Bernoulli 3 279-299.
  • Dacunha-Castelle, D. and Gassiat, E. (1999). Testing the order of a model using locally conic parameterization: population mixtures and stationary ARMA processes. Ann. Statist. 27 1178-1209.
  • Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577-588.
  • Henna, J. (1985). On estimating of the number of constituents of a finite mixture of continuous distributions. Ann. Inst. Statist. Math. 37 235-240.
  • Keribin, C. (2000). Consistent estimation of the order of mixture models. Sankhy ¯a Ser. A 62 49-62.
  • Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27 887-906.
  • Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Ann. Statist. 20 1350-1360.
  • Marchette, D. J., Priebe, C. E., Rogers, G. W. and Solka, J. L. (1996). The filtered kernel estimator. Comp. Statist. 11 95-112.
  • Marron, J. S. and Schmitz, H.-P. (1992). Simultaneous density estimation of several income distributions. Econometric Theory 8 476-488.
  • Marron, J. S. and Wand, M. P. (1992). Exact mean integrated squared error. Ann. Statist. 20 712-736.
  • McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Statist. 36 318-324.
  • Nolan, D. and Marron, J. S. (1989). Uniform consistency of automatic and location adaptive delta sequence estimators. Probab. Theory Related Fields 80 619-632.
  • Pfanzagl, J. (1988). Consistency of maximum likelihood estimators for certain nonparametric families, in particular: mixtures. J. Statist. Plann. Inference 19 137-158.
  • Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
  • Priebe, C. E. and Marchette, D. J. (2000). Alternating kernel and mixture density estimates. Comput. Statist. Data Anal. 35 43-65.
  • Redner, R. A. (1981). Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Ann. Statist. 9 225-228.
  • Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195-239.
  • Rissanen, J. (1978). Modeling by shortest data description. Automatica 14 465-471.
  • Ritov, Y. and Bickel, P. J. (1990). Achieving information bounds in nonand semiparametric models. Ann. Statist. 18 925-938.
  • Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. J. Amer. Statist. Assoc. 92 894-902.
  • Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
  • Tamura, R. N. and Boos, D. D. (1986). Minimum Hellinger distance estimation for multivariate location and covariance. J. Amer. Statist. Assoc. 81 223-229.