Statistics Surveys

Finite mixture models and model-based clustering

Volodymyr Melnykov and Ranjan Maitra

Full-text: Open access


Finite mixture models have a long history in statistics, having been used to model population heterogeneity, generalize distributional assumptions, and lately, for providing a convenient yet formal framework for clustering and classification. This paper provides a detailed review into mixture models and model-based clustering. Recent trends as well as open problems in the area are also discussed.

Article information

Statist. Surv., Volume 4 (2010), 80-116.

First available in Project Euclid: 29 April 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

EM algorithm model selection variable selection diagnostics two-dimensional gel electrophoresis data proteomics text mining magnitude magnetic resonance images


Melnykov, Volodymyr; Maitra, Ranjan. Finite mixture models and model-based clustering. Statist. Surv. 4 (2010), 80--116. doi:10.1214/09-SS053.

Export citation


  • [1] Aitkin, M., Anderson, D., and Hinde, J. (1981). Statistical modelling of data on teaching styles (with discussion)., Journal of the Royal Statistical Society B 144, 419–461.
  • [2] Aitkin, M. and Rubin, D. (1985). Estimation and hypothesis testing in finite mixture models., Journal of the Royal Statistical Society B 47, 67–75.
  • [3] Akaike, H. (1973). Information theory and an extension of the maximum likelihood principle. In, Second international symposium on information theory. 267–281.
  • [4] Anderson, E. (1935). The Irises of the Gaspe peninsula., Bulletin of the American Iris Society 59, 2–5.
  • [5] Atlas, R. and Overall, J. (1994). Comparative evaluation of two superior stopping rules for hierarchical cluster analysis., Psychometrika 59, 581–591.
  • [6] Azzalini, A. (1985). A class of distributions which includes the normal ones., Scandinavian Journal of Statistics 12, 171–178.
  • [7] Azzalini, A. (2005). The skew-normal distribution and related multivariate families (with discussion)., Scandinavian Journal of Statistics 32, 159–200.
  • [8] Azzalini, A. and Dalla Valle, A. (1996). The multivariate skew-normal distribution., Biometrika 83, 715–726.
  • [9] Baddeley, A. J. and Møller, J. (1989). Nearest-neighbour Markov point processes and random sets., International Statistical Review 2, 89–121.
  • [10] Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. (2005). Clustering on the unit hypesphere using von Mises-Fisher distributions., Journal of Machine Learning Research 6, 1345–1382.
  • [11] Banfield, J. D. and Raftery, A. E. (1993). Model-based Gaussian and non-Gaussian clustering., Biometrics 49, 803–821.
  • [12] Basu, S., Banerjee, A., and Mooney, R. (2002). Semi-supervised clustering by seeding. In, Proceedings of the 19th International Conference on Machine Learning. 19–26.
  • [13] Basu, S., Banerjee, A., and Mooney, R. (2004). Active semi-supervision for pairwise constrained clustering. In, Proceedings of the SIAM International Conference on Data Mining.
  • [14] Baudry, J.-P., Raftery, A., Celeux, G., Lo, K., and Gottardo, R. G. (2010). Combining mixture components for clustering., Journal of Computational and Graphical Statistics, to appear.
  • [15] Biernacki, C., Celeux, G., and Gold, E. M. (2000). Assessing a mixture model for clustering with the integrated completed likelihood., IEEE Transactions on Pattern Analysis and Machine Intelligence 22, 719–725.
  • [16] Biernacki, C., Celeux, G., and Govaert, G. (1999). An improvement of the NEC criterion for assessing the number of clusters in a mixture model., Pattern Recognition Letters 20, 267–272.
  • [17] Biernacki, C., Celeux, G., and Govaert, G. (2003). Choosing starting values for the EM algorithm for getting the highest likelihood in multivariate Gaussian mixture models., Computational Statistics and Data Analysis 413, 561–575.
  • [18] Biernacki, C., Celeux, G., Govaert, G., and Langrognet, F. (2006). Model-based clustering and discriminant analysis with the MIXMOD software., Computational Statistics and Data Analysis 51/2, 587–600.
  • [19] Blashfield, R. K. (1976). Mixture model tests of cluster analysis – Accuracy of 4 agglomerative hierarchical methods., Psychological Bulletin 83, 377–388.
  • [20] Böhning, D., Dietz, E., Schaub, R., Schlattmann, P., and Lindsay, B. (1994). The distribution of the likelihood ratio for mixtures of densities from the one-parameter exponential family., Annals of the Institute of Statistical Mathematics 46(2), 373–388.
  • [21] Böhning, D., Dietz, E., and Schlattmann, P. (1998). Recent developments in computer-assisted analysis of mixtures., Annals of the Institute of Statistical Mathematics 54, 2, 525–536.
  • [22] Box, G. E. P. and Draper, N. R. (1987)., Empirical Model-Building and Response Surfaces. John Wiley, New York, NY.
  • [23] Boyles, R. A. (1983). On the convergence of the EM algorithm., Journal of the Royal Statistical Society, Series B 45, 47–50.
  • [24] Bradley, P., Fayyad, U., and Reina, C. (1998). Scaling clustering algorithms to large databases. In, Proc. of the Fourth International Conference on Knowledge Discovery and Data Mining. Menlo Park, CA: AAAI Press, 9–15.
  • [25] Brodatz, P. (1966)., A Photographic Album for Artists and Designers. Dover, New York.
  • [26] Campbell, N. A. and Mahon, R. J. (1974). A multivariate study of variation in two species of rock crab of Genus, Leptograsus. Australian Journal of Zoology 22, 417–25.
  • [27] Chen, C., Forbes, F., and Francois, O. (2006). FASTRUCT: Model-based clustering made faster., Molecular Ecology Notes 6, 980–983.
  • [28] Chen, J. and Li, P. (2008). Hypothesis testing for normal mixture models: the EM approach., submitted to Annals of Statistics.
  • [29] Chen, W.-C., Maitra, R., and Melnykov, V. (2010). Model-based semi-supervised clustering., In preparation.
  • [30] Chipman, H. and Tibshirani, R. (2006). Hybrid hierarchical clustering with applications to microarray data., Biostatistics 7(2), 286–301.
  • [31] Chung, A. C. S. and Noble, J. A. (1999). Statistical 3d vessel segmentation using a Rician distribution. In, MICCAI. 82–89.
  • [32] Cramer, H. (1946)., Mathematical methods of statistics. Princeton University Press, Princeton, New Jersey.
  • [33] Dasgupta, A. and Raftery, A. E. (1998). Detecting features in spatial point processes with clutter via model-based clustering., Journal of the American Statistical Association 93, 294–302.
  • [34] Dasgupta, S. (1999). Learning mixtures of Gaussians. In, Proc. IEEE Symposium on Foundations of Computer Science. New York, 633–644.
  • [35] Day, N. (1969). Estimating the components of a mixture of two normal distributions., Biometrika 56, 463–474.
  • [36] Dempster, A. P., Laird, N. M., and Rubin, D. B. (1977). Maximum likelihood for incomplete data via the EM algorithm (with discussion)., Jounal of the Royal Statistical Society, Series B 39, 1–38.
  • [37] Dortet-Bernadet, J. and Wicker, N. (2008). Model-based clustering on the unit sphere with an illustration using gene expression profiles., Biostatistics 9, 1, 66–80.
  • [38] Fayyad, U. and Smyth, P. (1999). Cataloging and mining massive datasets for science data analysis., Journal of Computational and Graphical Statistics 8, 589–610.
  • [39] Feng, Z. and McCulloch, C. (1996). Using bootstrap likelihood ratio in finite mixture models., Journal of the Royal Statistical Society B 58, 609–617.
  • [40] Figueiredo, M. A. T. and Jain, A. K. (2002). Unsupervised learning of finite mixture models., IEEE Transactions on Pattern Analysis and Machine Intelligence 24, 3, 381–396.
  • [41] Finch, S., Mendell, N., and Thode, H. (1989). Probabilistic measures of adequacy of a numerical search for a global maximum., Journal of the American Statistical Association 84, 1020–1023.
  • [42] Forina, M. e. a. (1991). Parvus - an extendible package for data exploration, classification and correlation., Institute of Pharmaceutical and Food Analysis and Technologies, Via Brigata Salerno.
  • [43] Fraley, C., Raftery, A., and Wehrens, R. (2005). Incremental model-based clustering for large datasets with small clusters., Journal of Computational and Graphical Statistics 14, 529–546.
  • [44] Fraley, C. and Raftery, A. E. (2006). MCLUST version 3 for R: Normal mixture modeling and model-based clustering. Tech. Rep. 504, University of Washington, Department of Statistics, Seattle, WA., 2006.
  • [45] Frühwirth-Schnatter, S. (2006)., Finite Mixture and Markov Switching Models. Springer, New York.
  • [46] Gabriel, K. R. (1971). The biplot graphical display of matrices with application to principal component analysis., Biometrika 58, 453–467.
  • [47] Ghosh, J. and Sen, P. (1985). On the asymptotic performance of the loglikelihood ratio statistic for the mixture model and related results., Proceedings of the Berkeley Conference in Honor of Jerzy Neyman and Jack Kiefer 2, 789–806.
  • [48] Gilks, W., Richardson, S., and Spiegelhalter, D. (1996)., Markov Chain Monte Carlo in Practice. Chapman & Hall, London.
  • [49] Gold, E. M. and Hoffman, P. J. (1976). Flange detection cluster analysis., Multivariate Behavioral Research 11, 217–235.
  • [50] Goldberger, J. and Roweis, S. (2004). Hierarchical clustering of a mixture model., NIPS 2004.
  • [51] Gupta, A., Gonzalez-Farias, G., and Dominguez-Molina, A. (2002). A multivariate skew normal distribution., Journal of Multivariate Analysis 89, 181–190.
  • [52] Hartigan, J. (1985). Statistical theory in clustering., Journal of Classification 2, 63–76.
  • [53] Hathaway, R. J. (1985). A constrained formulation of maximum-likelihood estimation for normal mixture distributions., Statistics & Probability Letters 4, 53–56.
  • [54] Haughton, D. (1997). Packages for estimating finite mixtures: a review., The American Statistician 51, 194–205.
  • [55] Henderson, H. and Searle, S. (1979). Vec and Vech operators for matrices, with some uses in Jacobians and multivariate statistics., The Canadian Journal of Statistics 7, 65–81.
  • [56] Hennig, C. (2010). Methods for merging Gaussian mixture components., Advances in Data Analysis and Classification 4, 3–34.
  • [57] Huang, J.-T. and Hasegawa-Johnson, M. (2009). On semi-supervised learning of Gaussian mixture models for phonetic classification. In, NAACL HLT workshop on semi-supervised learning.
  • [58] Inoue, M. and Ueda, N. (2003). Exploitation of unlabeled sequences in hidden Markov models., IEEE Transactions On Pattern Analysis and Machine Intelligence 25, 1570–1581.
  • [59] Inselberg, A. (1985). The plane with parallel coordinates., The Visual Computer 1, 69–91.
  • [60] Kass, R. E. and Raftery, A. E. (1995). Bayes factors., Journal of the American Statistical Association 90, 773–795.
  • [61] Keribin, C. (2000). Consistent estimation of the order of finite mixture models., Sankhyā 62, 49–66.
  • [62] Kiefer, N. M. (1978). Discrete parameter variation: efficient estimation of a switching regression model., Econometrica 46, 427–434.
  • [63] Kuiper, F. K. and Fisher, L. (1975). A Monte Carlo comparison of six clustering procedures., Biometrics 31, 777–783.
  • [64] Li, J. (2005). Clustering based on multi-layer mixture model., Journal of Computational and Graphical Statistics 14(3), 547–568.
  • [65] Li, J., Ray, S., and Lindsay, B. (2007). A nonparametric statistical approach to clustering via mode identification., The Journal of Machine Learning Research 8, 1687–1723.
  • [66] Li, J. and Zha, H. (2006). Two-way Poisson mixture models for simultaneous document classification and word clustering., Computational Statistics and Data Analysis 50, 1, 163–180.
  • [67] Li, P., Chen, J., and Marriott, P. (2008). Non-finite Fisher information and homogeneity: an EM approach., Biometrika, 1–15.
  • [68] Likas, A., Vlassis, N., and Verbeek, J. J. (2003). The global, k-means clustering algorithm. Pattern Recognition 36, 451–461.
  • [69] Lin, T.-C. and Lin, T.-I. (2009). Supervised learning of multivariate skew normal mixture models with missing information., Computational Statistics.
  • [70] Lin, T. I. (2009). Maximum likelihood estimation for multivariate skew normal mixture models., Journal of Multivariate Analysis 100, 257–265.
  • [71] Lin, T. I., Lee, J. C., and Yen, S. Y. (2007). Finite mixture modelling using the skew normal distribution., Statistica Sinica 17, 909–927.
  • [72] Lindsay, B. (1983). The geometry of mixture likelihoods: a general theory., The Annals of Statistics 11, 1, 86–94.
  • [73] Lindsay, B. (1995)., Mixture models: Theory, Geometry and Applications.
  • [74] Louis, T. A. (1982). Finding the observed information matrix when using the EM algorithm., Journal of Royal Statistical Society, B 44, 226–233.
  • [75] Lu, Z. and Leen, T. (2005). Semi-supervised learning with penalized probabilistic clustering. In, Advances in NIPS. Vol. 17.
  • [76] Magnus, J. and Neudecker, H. (1999)., Matrix differential calculus with applications in statistics and econometrics, 2 ed. Wiley, New York.
  • [77] Maitra, R. (2001). Clustering massive datasets with applications to software metrics and tomography., Technometrics 43, 3, 336–346.
  • [78] Maitra, R. (2009). Initializing partition-optimization algorithms., IEEE/ACM Transactions on Computational Biology and Bioinformatics 6, 144–157.
  • [79] Maitra, R. and Faden, D. (2009). Noise estimation in magnitude MR datasets., IEEE Transactions on Medical Imaging 28, 10, 1615–1622.
  • [80] Maitra, R. and Melnykov, V. (2010a). Assessing significance in finite mixture models. Tech. Rep. 10-01, Department of Statistics, Iowa State, University.
  • [81] Maitra, R. and Melnykov, V. (2010b). Simulating data to study performance of finite mixture modeling and clustering algorithms., Journal of Computational and Graphical Statistics, in press.
  • [82] Maugis, C., Celeux, G., and Martin-Magniette, M.-L. (2009). Variable selection for clustering with Gaussian mixture models., Biometrics 65, 3, 701–709.
  • [83] McCulloch, C. (1982). Symmetric matrix derivatives with applications., Journal of the American Statistical Association 77, 679–682.
  • [84] McIntyre, R. M. and Blashfield, R. K. (1980). A nearest-centroid technique for evaluating the minimum-variance clustering procedure., Multivariate Behavioral Research 15, 225–238.
  • [85] McLachlan, G. (1987). On bootstrapping the likelihood ratio test statistic for the number of components in a normal mixture., Applied Statistics 36, 318–324.
  • [86] McLachlan, G. and Krishnan, T. (1997)., The EM Algorithm and Extensions. Wiley, New York.
  • [87] McLachlan, G. and Peel, D. (2000)., Finite Mixture Models. John Wiley and Sons, Inc., New York.
  • [88] McLachlan, G., Peel, G., Basford, K., and Adams, P. (1999). Fitting of mixtures of normal and, t-components. Journal of Statistical Software 4:2.
  • [89] McLachlan, G. J. and Basford, K. E. (1988)., Mixture Models: Inference and Applications to Clustering. Marcel Dekker, New York.
  • [90] Melnykov, V. and Maitra, R. (2010). CARP: Software for fishing out good clustering algorithms., Journal of Machine Learning Research, submitted.
  • [91] Melnykov, V., Maitra, R., and Nettleton, D. (2010). Accounting for spot matching uncertainty in the analysis of proteomics data from two-dimensional gel electrophoresis., In preparation.
  • [92] Milligan, G. W. (1985). An algorithm for generating artificial test clusters., Psychometrika 50, 123–127.
  • [93] Minnotte, M. and Scott, D. (1993). The mode tree: a tool for visualization of nonparametric density features., Journal of Computational and Graphical Statistics 2(1), 51–68.
  • [94] Morris, J. S., Clark, B. N., and Gutstein, H. B. (2008). Pinnacle: a fast, automatic and accurate method for detecting and quantifying protein spots in 2-dimensional gel electrophoresis data., Bioinformatics 24, 529–536.
  • [95] Newcomb, S. (1886). A generalized theory of the combination of observations so as to obtain the best result., American Journal of Mathematics 8, 343–366.
  • [96] Pan, W. and Shen, X. (2006). Penalized model-based clustering with application to variable selection., Journal of Machine Learning Research 8, 1145–1164.
  • [97] Pan, W., Shen, X., Jiang, A., and Hebbel, R. (2006). Semisupervised learning via penalized mixture model with application to microarray sample classification., Bioinformatics 22(19), 2388–2395.
  • [98] Pearson, K. (1894). Contribution to the mathematical theory of evolution., Philosophical Transactions of the Royal Society 185, 71–110.
  • [99] Peel, D. and McLachlan, G. (2000). Robust mixture modeling using the, t-distribution. Statistics and Computing 10, 339:348.
  • [100] Price, L. J. (1993). Identifying cluster overlap with normix population membership probabilities., Multivariate Behavioral Research 28, 235–262.
  • [101] Qiu, W. and Joe, H. (2006a). Generation of random clusters with specified degree of separation., Journal of Classification 23, 315–334.
  • [102] Qiu, W. and Joe, H. (2006b). Separation index and partial membership for clustering., Computational Statistics and Data Analysis 50, 585–603.
  • [103] Raftery, A. E. and Dean, N. (2006). Variable selection for model-based clustering., Journal of the American Statistical Association 101, 168–178.
  • [104] Ray, S. and Lindsay, B. (2005). The topography of multivariate normal mixtures., Annals of Statistics 33(5), 2042–2065.
  • [105] Ray, S. and Lindsay, B. (2008). Model selection in high dimensions: a quadratic-risk-based approach., Journal of Royal Statistical Society (B) 70, 95–118.
  • [106] Robert, C. and Casella, G. (1999)., Monte Carlo Statistical Methods. Springer-Verlag, New York.
  • [107] Roeder, K. and Wasserman, L. (1997). Practical Bayesian density estimation using mixtures of normals., Journal of the American Statistical Association 92, 894–902.
  • [108] Ruspini, E. (1970). Numerical methods for fuzzy clustering., Information Science 2, 319–350.
  • [109] Schwarz, G. (1978). Estimating the dimensions of a model., Annals of Statistics 6, 461–464.
  • [110] Shental, N., Bar-Hillel, A., Hertz, T., and Weinshall, D. (2003). Computing Gaussian mixture models with EM using equivalence constraints. In, Advances in NIPS. Vol. 15.
  • [111] Steinley, D. and Henson, R. (2005). Oclus: An analytic method for generating clusters with known overlap., Journal of Classification 22, 221–250.
  • [112] Stuetzle, W. and Nugent, R. (2010). A generalized single linkage method for estimating the cluster tree of a density., Journal of Computational and Graphical Statistics, in press.
  • [113] Titterington, D., Smith, A., and Makov, U. (1985)., Statistical Analysis of Finite Mixture Distributions. John Wiley & Sons, Chichester, U.K.
  • [114] Vardi, Y., Shepp, L. A., and Kaufman, L. A. (1985). A statistical model for Positron Emission Tomography., Journal of the American Statistical Association 80, 8–37.
  • [115] Verbeek, J., Vlassis, N., and Krose, B. (2003). Efficient greedy learning of Gaussian mixture models., Neural Computation 15, 469–485.
  • [116] Verbeek, J., Vlassis, N., and Nunnink, J. (2003). A variational EM algorithm for large-scale mixture modeling., Annual Conference of the Advanced School for Computing and Imaging, 1–7.
  • [117] Wang, H., Segal, E., and Koller, D. (2003). Discovering molecular pathways from protein interaction and gene expression data., Bioinformatics 19, 264–272.
  • [118] Wang, S. and Zhu, J. (2008). Variable selection for model-based high-dimensional clustering and its application to microarray data., Biometrics 64, 440–448.
  • [119] Wang, S. J., Woodward, W. A., Gray, H. L., Wiechecki, S., and Satin, S. R. (1997). A new test for outlier detection from a multivariate mixture distribution., Journal of Computational and Graphical Statistics 6, 285–299.
  • [120] Wang, T. and Lei, T. (1994). Statistical analysis of MR imaging and its application in image modeling. In, Proceedings of the IEEE International Conference on Image Processing and Neural Networks. Vol. 1. 866–870.
  • [121] Wegman, E. (1990). Hyperdimensional data analysis using parallel coordinates., Journal of the American Statistical Association 85, 664–675.
  • [122] Windham, M. P. and Cutler, A. (1992). Information ratios for validating mixture analyses., Journal of the American Statistical Association 87, 1188–1192.
  • [123] Wolfe, J. H. (1967). NORMIX: Computatinal methods for estimating the parameters of multivariate normal mixture distributions., Technical bulletin USNPRA SRM 68-2.
  • [124] Wolfe, J. H. (1970). Pattern clustering by multivariate mixture analysis., Multivariate Behavioral Research 5, 329–350.
  • [125] Wu, C. F. J. (1983). On convergence properties of the EM algorithm., The Annals of Statistics 11, 95–103.
  • [126] Xie, B., Pan, W., and Shen, X. (2008). Variable selection in penalized model-based clustering via regularization on grouped parameters., Bioinformatics 64, 921–930.
  • [127] Xu, R. and Wunsch, D. C. (2009)., Clustering. John Wiley and Sons, Inc, NJ, Hoboken.