Open Access
May 2020 A Bayesian sparse finite mixture model for clustering data from a heterogeneous population
Erlandson F. Saraiva, Adriano K. Suzuki, Luís A. Milan
Braz. J. Probab. Stat. 34(2): 323-344 (May 2020). DOI: 10.1214/18-BJPS425

Abstract

In this paper, we introduce a Bayesian approach for clustering data using a sparse finite mixture model (SFMM). The SFMM is a finite mixture model with a large number of components $k$ previously fixed where many components can be empty. In this model, the number of components $k$ can be interpreted as the maximum number of distinct mixture components. Then, we explore the use of a prior distribution for the weights of the mixture model that take into account the possibility that the number of clusters $k_{\mathbf{c}}$ (e.g., nonempty components) can be random and smaller than the number of components $k$ of the finite mixture model. In order to determine clusters we develop a MCMC algorithm denominated Split-Merge allocation sampler. In this algorithm, the split-merge strategy is data-driven and was inserted within the algorithm in order to increase the mixing of the Markov chain in relation to the number of clusters. The performance of the method is verified using simulated datasets and three real datasets. The first real data set is the benchmark galaxy data, while second and third are the publicly available data set on Enzyme and Acidity, respectively.

Citation

Download Citation

Erlandson F. Saraiva. Adriano K. Suzuki. Luís A. Milan. "A Bayesian sparse finite mixture model for clustering data from a heterogeneous population." Braz. J. Probab. Stat. 34 (2) 323 - 344, May 2020. https://doi.org/10.1214/18-BJPS425

Information

Received: 1 April 2018; Accepted: 1 November 2018; Published: May 2020
First available in Project Euclid: 4 May 2020

zbMATH: 07232932
MathSciNet: MR4093262
Digital Object Identifier: 10.1214/18-BJPS425

Keywords: Bayesian approach , Gibbs sampling , Metropolis–Hastings , mixture model , split-merge update

Rights: Copyright © 2020 Brazilian Statistical Association

Vol.34 • No. 2 • May 2020
Back to Top