The Annals of Statistics

Consistent estimation of mixture complexity

Lancelot F. James, David J. Marchette, and Carey E. Priebe
Source: Ann. Statist. Volume 29, Number 5 (2001), 1281-1296.

Abstract

The consistent estimation of mixture complexity is of fundamental importance in many applications of finite mixture models. An enormous body of literature exists regarding the application, computational issues and theoretical aspects of mixture models when the number of components is known, but estimating the unknown number of components remains an area of intense research effort. This article presents a semiparametric methodology yielding almost sure convergence of the estimated number of components to the true but unknown number of components. The scope of application is vast, as mixture models are routinely employed across the entire diverse application range of statistics,including nearly all of the social and experimental sciences.

First Page: Show Hide
Primary Subjects: 62G05
Secondary Subjects: 62G07
Full-text: Open access
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1013203454
Digital Object Identifier: doi:10.1214/aos/1013203454
Mathematical Reviews number (MathSciNet): MR1873331
Zentralblatt MATH identifier: 01829055

References

Barron, A. R. and Cover, T. M. (1991). Minimum Hellinger distance estimates for parametric models. IEEE Trans. Inform Theory 37 1034-1054.
Mathematical Reviews (MathSciNet): MR1111806
Digital Object Identifier: doi:10.1109/18.86996
Beran, R. (1977). Minimum Hellinger distance estimates for parametric models. Ann. Statist. 5 445-463.
Mathematical Reviews (MathSciNet): MR56:7005
Zentralblatt MATH: 0381.62028
Digital Object Identifier: doi:10.1214/aos/1176343842
Project Euclid: euclid.aos/1176343842
Bickel, P. J., Klaassen, C. A. J., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press.
Mathematical Reviews (MathSciNet): MR94m:62007
Cao, R., Cuevas, A. and Fraiman, R. (1995). Minimum distance density-based estimation. Comput. Statist. Data Anal. 20 611-631.
Zentralblatt MATH: 0875.62157
Mathematical Reviews (MathSciNet): MR1369185
Cao, R. and Devroye, L. (1996). The consistency of a smoothed minimum distance estimate. Scand. J. Statist. 23 405-418.
Zentralblatt MATH: 0898.62045
Mathematical Reviews (MathSciNet): MR1439705
Chen, J. and Kalbfleisch. J. D. (1996). Penalized minimum distance estimates in finite mixture models. Canad. J. Statist. 24 167-175.
Zentralblatt MATH: 0858.62019
Mathematical Reviews (MathSciNet): MR1406173
Digital Object Identifier: doi:10.2307/3315623
Clarke, B. R. and Heathcote, C. R. (1994). Robust estimation of k-component univariate normal mixtures. Ann. Inst. Statist. Math. 46 83-93.
Zentralblatt MATH: 0802.62039
Mathematical Reviews (MathSciNet): MR1272750
Digital Object Identifier: doi:10.1007/BF00773595
Cordero-Bra na, O. I and Cutler, A. (2001). On the asymptotic properties of the minimum Hellinger estimation in the case of a mixture model. Research Report 7/01/104, Dept. Mathematics and Statistics, Utah State Univ.
Cutler, A. and Cordero-Bra na, O. I. (1996). Minimum Hellinger distance estimation for finite mixture models. J. Amer. Statist. Assoc. 91 1716-1723.
Mathematical Reviews (MathSciNet): MR98b:62057
Zentralblatt MATH: 0881.62035
Digital Object Identifier: doi:10.2307/2291601
Dacunha-Castelle, D. and Gassiat, E. (1997). The estimation of the order of a mixture model. Bernoulli 3 279-299.
Mathematical Reviews (MathSciNet): MR98j:62019
Zentralblatt MATH: 0889.62012
Digital Object Identifier: doi:10.2307/3318593
Project Euclid: euclid.bj/1177334456
Dacunha-Castelle, D. and Gassiat, E. (1999). Testing the order of a model using locally conic parameterization: population mixtures and stationary ARMA processes. Ann. Statist. 27 1178-1209.
Mathematical Reviews (MathSciNet): MR1740115
Zentralblatt MATH: 0957.62073
Digital Object Identifier: doi:10.1214/aos/1017938921
Project Euclid: euclid.aos/1017938921
Escobar, M. D. and West, M. (1995). Bayesian density estimation and inference using mixtures. J. Amer. Statist. Assoc. 90 577-588.
Mathematical Reviews (MathSciNet): MR96d:62054
Zentralblatt MATH: 0826.62021
Digital Object Identifier: doi:10.2307/2291069
Henna, J. (1985). On estimating of the number of constituents of a finite mixture of continuous distributions. Ann. Inst. Statist. Math. 37 235-240.
Mathematical Reviews (MathSciNet): MR87a:62044
Zentralblatt MATH: 0577.62031
Digital Object Identifier: doi:10.1007/BF02481094
Keribin, C. (2000). Consistent estimation of the order of mixture models. Sankhy ¯a Ser. A 62 49-62.
Mathematical Reviews (MathSciNet): MR2001c:62026
Zentralblatt MATH: 01644942
Kiefer, J. and Wolfowitz, J. (1956). Consistency of the maximum likelihood estimator in the presence of infinitely many incidental parameters. Ann. Math. Statist. 27 887-906.
Mathematical Reviews (MathSciNet): MR19,189a
Zentralblatt MATH: 0073.14701
Digital Object Identifier: doi:10.1214/aoms/1177728066
Project Euclid: euclid.aoms/1177728066
Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Ann. Statist. 20 1350-1360.
Mathematical Reviews (MathSciNet): MR93i:62033
Zentralblatt MATH: 0763.62015
Digital Object Identifier: doi:10.1214/aos/1176348772
Project Euclid: euclid.aos/1176348772
Marchette, D. J., Priebe, C. E., Rogers, G. W. and Solka, J. L. (1996). The filtered kernel estimator. Comp. Statist. 11 95-112.
Marron, J. S. and Schmitz, H.-P. (1992). Simultaneous density estimation of several income distributions. Econometric Theory 8 476-488.
Marron, J. S. and Wand, M. P. (1992). Exact mean integrated squared error. Ann. Statist. 20 712-736.
Mathematical Reviews (MathSciNet): MR93f:62056
Zentralblatt MATH: 0746.62040
Digital Object Identifier: doi:10.1214/aos/1176348653
Project Euclid: euclid.aos/1176348653
McLachlan, G. J. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture. Appl. Statist. 36 318-324.
Nolan, D. and Marron, J. S. (1989). Uniform consistency of automatic and location adaptive delta sequence estimators. Probab. Theory Related Fields 80 619-632.
Mathematical Reviews (MathSciNet): MR90g:62086
Zentralblatt MATH: 0644.62041
Digital Object Identifier: doi:10.1007/BF00318909
Pfanzagl, J. (1988). Consistency of maximum likelihood estimators for certain nonparametric families, in particular: mixtures. J. Statist. Plann. Inference 19 137-158.
Mathematical Reviews (MathSciNet): MR89g:62063
Zentralblatt MATH: 0656.62044
Digital Object Identifier: doi:10.1016/0378-3758(88)90069-9
Pollard, D. (1984). Convergence of Stochastic Processes. Springer, New York.
Mathematical Reviews (MathSciNet): MR86i:60074
Zentralblatt MATH: 0544.60045
Priebe, C. E. and Marchette, D. J. (2000). Alternating kernel and mixture density estimates. Comput. Statist. Data Anal. 35 43-65.
Mathematical Reviews (MathSciNet): MR1815573
Zentralblatt MATH: 1142.62338
Redner, R. A. (1981). Note on the consistency of the maximum likelihood estimate for nonidentifiable distributions. Ann. Statist. 9 225-228.
Mathematical Reviews (MathSciNet): MR83c:62046
Zentralblatt MATH: 0453.62021
Digital Object Identifier: doi:10.1214/aos/1176345353
Project Euclid: euclid.aos/1176345353
Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm. SIAM Rev. 26 195-239.
Mathematical Reviews (MathSciNet): MR85j:62027
Zentralblatt MATH: 0536.62021
Digital Object Identifier: doi:10.1137/1026034
Rissanen, J. (1978). Modeling by shortest data description. Automatica 14 465-471.
Zentralblatt MATH: 0418.93079
Ritov, Y. and Bickel, P. J. (1990). Achieving information bounds in nonand semiparametric models. Ann. Statist. 18 925-938.
Mathematical Reviews (MathSciNet): MR91g:62034
Zentralblatt MATH: 0722.62025
Digital Object Identifier: doi:10.1214/aos/1176347633
Project Euclid: euclid.aos/1176347633
Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals. J. Amer. Statist. Assoc. 92 894-902.
Mathematical Reviews (MathSciNet): MR98k:62038
Zentralblatt MATH: 0889.62021
Digital Object Identifier: doi:10.2307/2965553
Silverman, B. W. (1986). Density Estimation for Statistics and Data Analysis. Chapman and Hall, New York.
Mathematical Reviews (MathSciNet): MR87k:62074
Zentralblatt MATH: 0617.62042
Tamura, R. N. and Boos, D. D. (1986). Minimum Hellinger distance estimation for multivariate location and covariance. J. Amer. Statist. Assoc. 81 223-229.
Zentralblatt MATH: 0601.62051
Mathematical Reviews (MathSciNet): MR830585
Digital Object Identifier: doi:10.2307/2287994

2012 © Institute of Mathematical Statistics

The Annals of Statistics

The Annals of Statistics