Statistical Science

An Overview of Semiparametric Extensions of Finite Mixture Models

Sijia Xiang, Weixin Yao, and Guangren Yang

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Finite mixture models have offered a very important tool for exploring complex data structures in many scientific areas, such as economics, epidemiology and finance. Semiparametric mixture models, which were introduced into traditional finite mixture models in the past decade, have brought forth exciting developments in their methodologies, theories, and applications. In this article, we not only provide a selective overview of the newly-developed semiparametric mixture models, but also discuss their estimation methodologies, theoretical properties if applicable, and some open questions. Recent developments are also discussed.

Article information

Statist. Sci., Volume 34, Number 3 (2019), 391-404.

First available in Project Euclid: 11 October 2019

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

EM algorithm mixture models mixture regression models semiparametric mixture models


Xiang, Sijia; Yao, Weixin; Yang, Guangren. An Overview of Semiparametric Extensions of Finite Mixture Models. Statist. Sci. 34 (2019), no. 3, 391--404. doi:10.1214/19-STS698.

Export citation


  • Al Mohamad, D. and Boumahdaf, A. (2018). Semiparametric two-component mixture models when one component is defined through linear constraints. IEEE Trans. Inform. Theory 64 795–830.
  • Allman, E. S., Matias, C. and Rhodes, J. A. (2009). Identifiability of parameters in latent structure models with many observed variables. Ann. Statist. 37 3099–3132.
  • Balabdaoui, F. (2017). Revisiting the Hodges–Lehmann estimator in a location mixture model: Is asymptotic normality good enough? Electron. J. Stat. 11 4563–4595.
  • Balabdaoui, F. and Doss, C. R. (2018). Inference for a two-component mixture of symmetric distributions under log-concavity. Bernoulli 24 1053–1071.
  • Benaglia, T., Chauveau, D. and Hunter, D. R. (2009). An EM-like algorithm for semi- and nonparametric estimation in multivariate mixtures. J. Comput. Graph. Statist. 18 505–526.
  • Bordes, L., Chauveau, D. and Vandekerkhove, P. (2007). A stochastic EM algorithm for a semiparametric mixture model. Comput. Statist. Data Anal. 51 5429–5443.
  • Bordes, L., Delmas, C. and Vandekerkhove, P. (2006). Semiparametric estimation of a two-component mixture model where one component is known. Scand. J. Stat. 33 733–752.
  • Bordes, L., Kojadinovic, I. and Vandekerkhove, P. (2013). Semiparametric estimation of a two-component mixture of linear regressions in which one component is known. Electron. J. Stat. 7 2603–2644.
  • Bordes, L., Mottelet, S. and Vandekerkhove, P. (2006). Semiparametric estimation of a two-component mixture model. Ann. Statist. 34 1204–1232.
  • Bordes, L. and Vandekerkhove, P. (2010). Semiparametric two-component mixture model with a known component: An asymptotically normal estimator. Math. Methods Statist. 19 22–41.
  • Butucea, C., Ngueyep Tzoumpe, R. and Vandekerkhove, P. (2017). Semiparametric topographical mixture models with symmetric errors. Bernoulli 23 825–862.
  • Butucea, C. and Vandekerkhove, P. (2014). Semiparametric mixtures of symmetric distributions. Scand. J. Stat. 41 227–239.
  • Cao, J. and Yao, W. (2012). Semiparametric mixture of binomial regression with a degenerate component. Statist. Sinica 22 27–46.
  • Chang, G. T. and Walther, G. (2007). Clustering with mixtures of log-concave distributions. Comput. Statist. Data Anal. 51 6242–6251.
  • Chauveau, D., Hunter, D. R. and Levinez, M. (2015). Estimation for conditional independence multivariate finite mixture models. Stat. Surv. 9 1–31.
  • Chee, C.-S. and Wang, Y. (2013). Estimation of finite mixtures with symmetric components. Stat. Comput. 23 233–249.
  • Chen, H., Chen, J. and Kalbfleisch, J. D. (2004). Testing for a finite mixture model with two components. J. R. Stat. Soc. Ser. B. Stat. Methodol. 66 95–115.
  • Chen, J. and Li, P. (2009). Hypothesis test for normal mixture models: The EM approach. Ann. Statist. 37 2523–2542.
  • Dacunha-Castelle, D. and Gassiat, E. (1999). Testing the order of a model using locally conic parametrization: Population mixtures and stationary ARMA processes. Ann. Statist. 27 1178–1209.
  • Dannemann, J., Holzmann, H. and Leister, A. (2014). Semiparametric hidden Markov models: Identifiability and estimation. Comput. Statist. 6 418–425.
  • De Castro, Y., Gassiat, É. and Le Corff, S. (2017). Consistent estimation of the filtering and marginal smoothing distributions in nonparametric hidden Markov models. IEEE Trans. Inform. Theory 63 4758–4777.
  • Dziak, J. J., Li, R., Tan, X., Shiffman, S. and Shiyko, M. P. (2015). Modeling intensive longitudinal data with mixtures of nonparametric trajectories and time-varying effects. Psychol. Methods 20 444–469.
  • Faicel, C. (2016). Unsupervised learning of regression mixture models with unknown number of components. J. Stat. Comput. Simul. 86 2308–2334.
  • Fan, J. and Gijbels, I. (1996). Local Polynomial Modelling and Its Applications. Monographs on Statistics and Applied Probability 66. CRC Press, London.
  • Fan, J., Zhang, C. and Zhang, J. (2001). Generalized likelihood ratio statistics and Wilks phenomenon. Ann. Statist. 29 153–193.
  • Gassiat, E. (2017). Mixtures of nonparametric components and hidden Markov models. In Handbook of Mixture Analysis (S. Fruhwirth-Schnatter, G. Celeux and C. P. Robert, eds.) 343–360. CRC Press, Boca Raton, FL.
  • Gassiat, E., Cleynen, A. and Robin, S. (2016). Inference in finite state space non parametric hidden Markov models and applications. Stat. Comput. 26 61–71.
  • Gassiat, E. and Rousseau, J. (2016). Nonparametric finite translation hidden Markov models and extensions. Bernoulli 22 193–212.
  • Gassiat, E., Rousseau, J. and Vernet, E. (2018). Efficient semiparametric estimation and model selection for multidimensional mixtures. Electron. J. Stat. 12 703–740.
  • Hall, P. and Zhou, X.-H. (2003). Nonparametric estimation of component distributions in a multivariate mixture. Ann. Statist. 31 201–224.
  • Härdle, W., Hall, P. and Ichimura, H. (1993). Optimal smoothing in single-index models. Ann. Statist. 21 157–178.
  • Hohmann, D. and Holzmann, H. (2013). Semiparametric location mixtures with distinct components. Statistics 47 348–362.
  • Hu, H., Wu, Y. and Yao, W. (2016). Maximum likelihood estimation of the mixture of log-concave densities. Comput. Statist. Data Anal. 101 137–147.
  • Hu, H., Yao, W. and Wu, Y. (2017). The robust EM-type algorithms for log-concave mixtures of regression models. Comput. Statist. Data Anal. 111 14–26.
  • Huang, M., Li, R. and Wang, S. (2013). Nonparametric mixture of regression models. J. Amer. Statist. Assoc. 108 929–941.
  • Huang, M. and Yao, W. (2012). Mixture of regression models with varying mixing proportions: A semiparametric approach. J. Amer. Statist. Assoc. 107 711–724.
  • Huang, M., Li, R., Wang, H. and Yao, W. (2014). Estimating mixture of Gaussian processes by kernel smoothing. J. Bus. Econom. Statist. 32 259–270.
  • Huang, M., Wang, S., Wang, H. and Jin, T. (2018a). Maximum smoothed likelihood estimation for a class of semiparametric Pareto mixture densities. Stat. Interface 11 31–40.
  • Huang, M., Wang, S., Yao, W. and Chen, Y. (2018b). Statistical inference and applications of mixture of varying coefficient models. Scand. J. Stat. 45 618–643.
  • Hunter, D. R., Wang, S. and Hettmansperger, T. P. (2007). Inference for mixtures of symmetric distributions. Ann. Statist. 35 224–251.
  • Hunter, D. R. and Young, D. S. (2012). Semiparametric mixtures of regressions. J. Nonparametr. Stat. 24 19–38.
  • Ichimura, H. (1993). Semiparametric least squares (SLS) and weighted SLS estimation of single-index models. J. Econometrics 58 71–120.
  • Jacobs, R. A., Peng, F. and Tanner, M. A. (1997). A Bayesian approach to model selection in hierarchical mixtures-of-experts architectures. Neural Netw. 10 231–241.
  • Lemdani, M. and Pons, O. (1999). Likelihood ratio tests in contamination models. Bernoulli 5 705–719.
  • Leroux, B. G. (1992). Consistent estimation of a mixing distribution. Ann. Statist. 20 1350–1360.
  • Levine, M., Hunter, D. R. and Chauveau, D. (2011). Maximum smoothed likelihood for multivariate mixtures. Biometrika 98 403–416.
  • Lindsay, B. G. (1983). The geometry of mixture likelihoods: A general theory. Ann. Statist. 11 86–94.
  • Ma, Y. and Yao, W. (2015). Flexible estimation of a semiparametric two-component mixture model with one parametric component. Electron. J. Stat. 9 444–474.
  • Ma, Y., Wang, S., Xu, L. and Yao, W. (2018). Semiparametric mixture regression with unspecified error distributions. Available at arXiv:1811.01117.
  • Maiboroda, R. and Sugakova, O. (2011). Generalized estimating equations for symmetric distributions observed with admixture. Comm. Statist. Theory Methods 40 96–116.
  • McLachlan, G. J., Bean, R. W. and Jones, L. B.-T. (2006). A simple implementation of a normal mixture approach to differential gene expression in multiclass microarrays. Bioinformatics 22 1608–1615.
  • McLachlan, G. and Peel, D. (2000). Finite Mixture Models. Wiley Interscience, New York.
  • Montuelle, L. and Le Pennec, E. (2014). Mixture of Gaussian regressions model with logistic weights, a penalized maximum likelihood approach. Electron. J. Stat. 8 1661–1695.
  • Nguyen, V. H. and Matias, C. (2014). On efficient estimators of the proportion of true null hypotheses in a multiple testing setup. Scand. J. Stat. 41 1167–1194.
  • Patra, R. K. and Sen, B. (2016). Estimation of a two-component mixture model with applications to multiple testing. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 869–893.
  • Pommeret, D. and Vandekerkhove, P. (2018). Semiparametric false discovery rate model Gaussianity test. Available at
  • Roger, G. and Pol, S. (1991). Stochastic Finite Elements: A Special Approach. Springer, Berlin.
  • Rufibach, K. (2007). Computing maximum likelihood estimators of a log-concave density function. J. Stat. Comput. Simul. 77 561–574.
  • Song, S., Nicolae, D. L. and Song, J. (2010). Estimating the mixing proportion in a semiparametric mixture model. Comput. Statist. Data Anal. 54 2276–2283.
  • Tan, X., Shiyko, M. P., Li, R., Li, Y. and Dierker, L. (2012). A time-varying effect model for intensive longitudinal data. Psychol. Methods 17 61–77.
  • Vandekerkhove, P. (2013). Estimation of a semiparametric mixture of regressions model. J. Nonparametr. Stat. 25 181–208.
  • von Neumann, J. (1931). Die Eindeutigkeit der Schrödingerschen Operatoren. Math. Ann. 104 570–578.
  • Walther, G. (2002). Detecting the presence of mixing with multiscale maximum likelihood. J. Amer. Statist. Assoc. 97 508–513.
  • Wang, Y. (2010). Maximum likelihood computation for fitting semiparametric mixture models. Stat. Comput. 20 75–86.
  • Wang, S., Yao, W. and Huang, M. (2014). A note on the identifiability of nonparametric and semiparametric mixtures of GLMs. Statist. Probab. Lett. 93 41–45.
  • Wang, S., Huang, M., Wu, X. and Yao, W. (2016). Mixture of functional linear models and its application to $\mathrm{CO}_{2}$-GDP functional data. Comput. Statist. Data Anal. 97 1–15.
  • Wu, Q. and Yao, W. (2016). Mixtures of quantile regressions. Comput. Statist. Data Anal. 93 162–176.
  • Wu, J., Yao, W. and Xiang, S. (2017). Computation of an efficient and robust estimator in a semiparametric mixture model. J. Stat. Comput. Simul. 87 2128–2137.
  • Xiang, S. and Yao, W. (2017). Semiparametric mixtures of regressions with single-index for model based clustering. Available at arXiv:1708.04142.
  • Xiang, S. and Yao, W. (2018). Semiparametric mixtures of nonparametric regressions. Ann. Inst. Statist. Math. 70 131–154.
  • Xiang, S., Yao, W. and Seo, B. (2016). Semiparametric mixture: Continuous scale mixture approach. Comput. Statist. Data Anal. 103 413–425.
  • Xiang, S., Yao, W. and Wu, J. (2014). Minimum profile Hellinger distance estimation for a semiparametric mixture model. Canad. J. Statist. 42 246–267.
  • Yao, F., Fu, Y. and Lee, T. C. M. (2011). Functional mixture regression. Biostatistics 12 341–353.
  • Young, D. S. (2014). Mixtures of regressions with changepoints. Stat. Comput. 24 265–281.
  • Young, D. S. and Hunter, D. R. (2010). Mixtures of regressions with predictor-dependent mixing proportions. Comput. Statist. Data Anal. 54 2253–2266.