Electronic Journal of Statistics

Estimation and model selection for model-based clustering with the conditional classification likelihood

Jean-Patrick Baudry

Full-text: Open access


The Integrated Completed Likelihood (ICL) criterion was introduced by Biernacki, Celeux and Govaert (2000) in the model-based clustering framework to select a relevant number of classes and has been used by statisticians in various application areas. A theoretical study of ICL is proposed.

A contrast related to the clustering objective is introduced: the conditional classification likelihood. An estimator and model selection criteria are deduced. The properties of these new procedures are studied and ICL is proved to be an approximation of one of these criteria. We contrast these results with the current leading point of view about ICL, that it would not be consistent. Moreover these results give insights into the class notion underlying ICL and feed a reflection on the class notion in clustering.

General results on penalized minimum contrast criteria and upper-bounds of the bracketing entropy in parametric situations are derived, which can be useful per se.

Practical solutions for the computation of the introduced procedures are proposed, notably an adapted EM algorithm and a new initialization method for EM-like algorithms which helps to improve the estimation in Gaussian mixture models.

Article information

Electron. J. Statist., Volume 9, Number 1 (2015), 1041-1077.

Received: March 2014
First available in Project Euclid: 27 May 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 62H12: Estimation

Bracketing entropy ICL model-based clustering model selection number of classes penalized criteria


Baudry, Jean-Patrick. Estimation and model selection for model-based clustering with the conditional classification likelihood. Electron. J. Statist. 9 (2015), no. 1, 1041--1077. doi:10.1214/15-EJS1026. https://projecteuclid.org/euclid.ejs/1432732304

Export citation


  • [1] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In, Proceedings, 2nd Internat. Symp. on Information Theory 267–281.
  • [2] Ambroise, C. and Govaert, G. (2000). Clustering by maximizing a fuzzy classification maximum likelihood criterion. In, COMPSTAT 187–192. Springer.
  • [3] Arlot, S. (2007). Resampling and model selection. PhD thesis, Univ., Paris-Sud.
  • [4] Barron, A., Birgé, L. and Massart, P. (1999). Risk bounds for model selection via penalization., Probability Theory and Related Fields 113 301–413.
  • [5] Baudry, J. P. (2009). Model selection for clustering. Choosing the number of classes. PhD thesis, Univ. Paris-Sud., http://tel.archives-ouvertes.fr/tel-00461550/fr/.
  • [6] Baudry, J. P., Celeux, G. and Marin, J. M. (2008). Selecting models focussing on the modeler’s purpose. In, COMPSTAT 2008: Proceedings in Computational Statistics 337–348. Physica-Verlag, Heidelberg.
  • [7] Baudry, J.-P., Maugis, C. and Michel, B. (2011). Slope heuristics: overview and implementation., Statist. Comput. 22 455–470.
  • [8] Baudry, J. P., Raftery, A. E., Celeux, G., Lo, K. and Gottardo, R. (2010). Combining mixture components for clustering., J. Comput. Graph. Statist. 19 332–353.
  • [9] Biernacki, C., Celeux, G. and Govaert, G. (2000). Assessing a mixture model for clustering with the integrated completed likelihood., IEEE Trans. PAMI 22 719–725.
  • [10] Biernacki, C., Celeux, G. and Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models., Computational Statistics & Data Analysis 41 567–575.
  • [11] Biernacki, C. and Govaert, G. (1997). Using the classification likelihood to choose the number of clusters., Computing Science and Statistics 29 451–457.
  • [12] Birgé, L. and Massart, P. (2001). Gaussian model selection., Journal of the European Mathematical Society 3 203–268.
  • [13] Birgé, L. and Massart, P. (2007). Minimal penalties for Gaussian model selection., Probab. Theory Related Fields 138 33–73.
  • [14] Celeux, G. and Govaert, G. (1993). Comparison of the mixture and the classification maximum likelihood in cluster analysis., Journal of Statistical Computation and simulation 47 127–146.
  • [15] Celeux, G. and Govaert, G. (1995). Gaussian parsimonious clustering models., Pattern Recognition 28 781–793.
  • [16] De Granville, C., Southerland, J. and Fagg, A. H. (2006). Learning grasp affordances through human demonstration. In, Proceedings of the International Conference on Development and Learning, electronically published.
  • [17] Dempster, A. P., Laird, N. M. and Rubin, D. B. (1977). Maximum likelihood from incomplete data via the EM-algorithm., Journal of the Royal Statistical Society. Series B 39 1–38.
  • [18] Dudley, R. M. (1999)., Uniform Central Limit Theorems. Cambridge Univ Press.
  • [19] Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation., J. Amer. Statist. Assoc. 97 611–631.
  • [20] Goutte, C., Hansen, L. K., Liptrot, M. G. and Rostrup, E. (2001). Feature-space clustering for fMRI meta-analysis., Human Brain Mapping 13 165–183.
  • [21] Hamelryck, T., Kent, J. T. and Krogh, A. (2006). Sampling realistic protein conformations using local structural bias., PLoS Comput. Biol. 2 e131.
  • [22] Hennig, C. (2010). Methods for merging Gaussian mixture components., Adv. Data Anal. Classif. 4 3–34.
  • [23] Keribin (2000). Consistent estimation of the order of mixture models., Sankhya A 62 49–66.
  • [24] Laloë, T. and Servien, R. (2013). The X-alter algorithm: a parameter-free method of unsupervised clustering., Journal of Modern Applied Statistical Methods 12 14.
  • [25] Lange, K. (1999)., Numerical Analysis for Statisticians. Springer-Verlag, New-York.
  • [26] Lee, S. X. and McLachlan, G. J. (2013). Model-based clustering and classification with non-normal mixture distributions., Statistical Methods & Applications 22 427–454.
  • [27] Mariadassou, M., Robin, S. and Vacher, C. (2010). Uncovering latent structure in valued graphs: a variational approach., Ann. Appl. Stat. 4 715–742.
  • [28] Massart, P. (2007)., Concentration Inequalities and Model Selection. Lecture Notes in Math. Springer.
  • [29] Maugis, C. and Michel, B. (2011). A non asymptotic penalized criterion for Gaussian mixture model selection., ESAIM Probab. Stat. 15 41–68.
  • [30] McLachlan, G. and Peel, D. (2000)., Finite Mixture Models. Wiley, New York.
  • [31] McQuarrie, A. D. R. and Tsai, C. L. (1998)., Regression and Time Series Model Selection. World Scientific.
  • [32] Nishii, R. (1988). Maximum likelihood principle and model selection when the true model is unspecified., J. Multivariate Anal. 27 392–403.
  • [33] Pelleg, D., Moore, A. W. et alet al. (2000). X-means: extending K-means with efficient estimation of the number of clusters. In, ICML 727–734.
  • [34] Pigeau, A. and Gelgon, M. (2005). Building and tracking hierarchical geographical & temporal partitions for image collection management on mobile devices. In, Proceedings 13th Annual ACM Internat. Conf. on Multimedia 141–150. ACM, New York, NY, USA.
  • [35] Redner, R. A. and Walker, H. F. (1984). Mixture densities, maximum likelihood and the EM algorithm., SIAM Rev. 26 195–239.
  • [36] Rigaill, G., Lebarbier, E. and Robin, S. (2012). Exact posterior distributions and model selection criteria for multiple change-point detection problems., Statist. Comput. 1–13.
  • [37] Schwarz, G. (1978). Estimating the dimension of a model., Ann. Statist. 6 461–464.
  • [38] Steele, R. J. and Raftery, A. E. (2010). Performance of Bayesian model selection criteria for Gaussian mixture models. In, Frontiers of Statistical Decision Making and Bayesian Analysis 113–130. Springer.
  • [39] Symons, M. J. (1981). Clustering criteria and multivariate normal mixtures., Biometrics 37 35–43.
  • [40] Titterington, D. M., Smith, A. F. M. and Makov, U. E. (1985)., Statistical Analysis of Finite Mixture Distributions. Wiley, New York.
  • [41] van der Vaart, A. (1998)., Asymptotic Statistics. Cambridge University Press.