Open Access
2020 Anchored Bayesian Gaussian mixture models
Deborah Kunkel, Mario Peruggia
Electron. J. Statist. 14(2): 3869-3913 (2020). DOI: 10.1214/20-EJS1756

Abstract

Finite mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We propose a model in which a small number of observations are assumed to arise from some of the labeled component densities. The resulting model is not exchangeable, allowing inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide asymptotic results leading to practical guidelines for model selection that are motivated by maximizing prior information about the class labels and demonstrate our method on real and simulated data.

Citation

Download Citation

Deborah Kunkel. Mario Peruggia. "Anchored Bayesian Gaussian mixture models." Electron. J. Statist. 14 (2) 3869 - 3913, 2020. https://doi.org/10.1214/20-EJS1756

Information

Received: 1 October 2019; Published: 2020
First available in Project Euclid: 22 October 2020

zbMATH: 07270280
MathSciNet: MR4165496
Digital Object Identifier: 10.1214/20-EJS1756

Keywords: data-dependent prior , EM algorithm , Identifiability , label switching

Vol.14 • No. 2 • 2020
Back to Top