Translator Disclaimer
May 2020 A Bayesian sparse finite mixture model for clustering data from a heterogeneous population
Erlandson F. Saraiva, Adriano K. Suzuki, Luís A. Milan
Braz. J. Probab. Stat. 34(2): 323-344 (May 2020). DOI: 10.1214/18-BJPS425


In this paper, we introduce a Bayesian approach for clustering data using a sparse finite mixture model (SFMM). The SFMM is a finite mixture model with a large number of components $k$ previously fixed where many components can be empty. In this model, the number of components $k$ can be interpreted as the maximum number of distinct mixture components. Then, we explore the use of a prior distribution for the weights of the mixture model that take into account the possibility that the number of clusters $k_{\mathbf{c}}$ (e.g., nonempty components) can be random and smaller than the number of components $k$ of the finite mixture model. In order to determine clusters we develop a MCMC algorithm denominated Split-Merge allocation sampler. In this algorithm, the split-merge strategy is data-driven and was inserted within the algorithm in order to increase the mixing of the Markov chain in relation to the number of clusters. The performance of the method is verified using simulated datasets and three real datasets. The first real data set is the benchmark galaxy data, while second and third are the publicly available data set on Enzyme and Acidity, respectively.


Download Citation

Erlandson F. Saraiva. Adriano K. Suzuki. Luís A. Milan. "A Bayesian sparse finite mixture model for clustering data from a heterogeneous population." Braz. J. Probab. Stat. 34 (2) 323 - 344, May 2020.


Received: 1 April 2018; Accepted: 1 November 2018; Published: May 2020
First available in Project Euclid: 4 May 2020

zbMATH: 07232932
MathSciNet: MR4093262
Digital Object Identifier: 10.1214/18-BJPS425

Rights: Copyright © 2020 Brazilian Statistical Association


This article is only available to subscribers.
It is not available for individual sale.

Vol.34 • No. 2 • May 2020
Back to Top