Finite mixtures are a flexible modeling tool for irregularly shaped densities and samples from heterogeneous populations. When modeling with mixtures using an exchangeable prior on the component features, the component labels are arbitrary and are indistinguishable in posterior analysis. This makes it impossible to attribute any meaningful interpretation to the marginal posterior distributions of the component features. We propose a model in which a small number of observations are assumed to arise from some of the labeled component densities. The resulting model is not exchangeable, allowing inference on the component features without post-processing. Our method assigns meaning to the component labels at the modeling stage and can be justified as a data-dependent informative prior on the labelings. We show that our method produces interpretable results, often (but not always) similar to those resulting from relabeling algorithms, with the added benefit that the marginal inferences originate directly from a well specified probability model rather than a post hoc manipulation. We provide asymptotic results leading to practical guidelines for model selection that are motivated by maximizing prior information about the class labels and demonstrate our method on real and simulated data.
"Anchored Bayesian Gaussian mixture models." Electron. J. Statist. 14 (2) 3869 - 3913, 2020. https://doi.org/10.1214/20-EJS1756