Open Access
June 2009 Improved criteria for clustering based on the posterior similarity matrix
Arno Fritsch, Katja Ickstadt
Bayesian Anal. 4(2): 367-391 (June 2009). DOI: 10.1214/09-BA414

Abstract

In this paper we address the problem of obtaining a single clustering estimate $\hat{c}$ based on an MCMC sample of clusterings $c^{(1)},c^{(2)}\ldots,c^{(M)}$ from the posterior distribution of a Bayesian cluster model. Methods to derive $\hat{c}$ when the number of groups $K$ varies between the clusterings are reviewed and discussed. These include the maximum a posteriori (MAP) estimate and methods based on the posterior similarity matrix, a matrix containing the posterior probabilities that the observations $i$ and $j$ are in the same cluster. The posterior similarity matrix is related to a commonly used loss function by Binder (1978). Minimization of the loss is shown to be equivalent to maximizing the Rand index between estimated and true clustering. We propose new criteria for estimating a clustering, which are based on the posterior expected adjusted Rand index. The criteria are shown to possess a shrinkage property and outperform Binder's loss in a simulation study and in an application to gene expression data. They also perform favorably compared to other clustering procedures.

Citation

Download Citation

Arno Fritsch. Katja Ickstadt. "Improved criteria for clustering based on the posterior similarity matrix." Bayesian Anal. 4 (2) 367 - 391, June 2009. https://doi.org/10.1214/09-BA414

Information

Published: June 2009
First available in Project Euclid: 22 June 2012

zbMATH: 1330.62249
MathSciNet: MR2507368
Digital Object Identifier: 10.1214/09-BA414

Keywords: Adjusted Rand index , cluster analysis , Dirichlet process mixture model , Markov chain Monte Carlo

Rights: Copyright © 2009 International Society for Bayesian Analysis

Vol.4 • No. 2 • June 2009
Back to Top