Translator Disclaimer
August 2021 Nonexchangeable random partition models for microclustering
Giuseppe Di Benedetto, François Caron, Yee Whye Teh
Author Affiliations +
Ann. Statist. 49(4): 1931-1957 (August 2021). DOI: 10.1214/20-AOS2003

Abstract

Many popular random partition models, such as the Chinese restaurant process and its two-parameter extension, fall in the class of exchangeable random partitions, and have found wide applicability in various fields. While the exchangeability assumption is sensible in many cases, it implies that the size of the clusters necessarily grows linearly with the sample size, and such feature may be undesirable for some applications. We present here a flexible class of nonexchangeable random partition models, which are able to generate partitions whose cluster sizes grow sublinearly with the sample size, and where the growth rate is controlled by one parameter. Along with this result, we provide the asymptotic behaviour of the number of clusters of a given size, and show that the model can exhibit a power-law behaviour, controlled by another parameter. The construction is based on completely random measures and a Poisson embedding of the random partition, and inference is performed using a Sequential Monte Carlo algorithm. Experiments on real data sets emphasise the usefulness of the approach compared to a two-parameter Chinese restaurant process.

Funding Statement

GDB acknowledges support from EPSRC under grant EP/L016710/1. FC and YWT acknowledge funding from the ERC under the European Union’s 7th Frame-work programme (FP7/2007-2013) ERC grant agreement no. 617071. FC acknowledges support from EPSRC under grant EP/P026753/1.

Citation

Download Citation

Giuseppe Di Benedetto. François Caron. Yee Whye Teh. "Nonexchangeable random partition models for microclustering." Ann. Statist. 49 (4) 1931 - 1957, August 2021. https://doi.org/10.1214/20-AOS2003

Information

Received: 1 June 2019; Revised: 1 May 2020; Published: August 2021
First available in Project Euclid: 29 September 2021

Digital Object Identifier: 10.1214/20-AOS2003

Subjects:
Primary: 60G55 , 60G57 , 62F15

Keywords: completely random measure , power-law , Random partitions , sparse random graph , stochastic process

Rights: Copyright © 2021 Institute of Mathematical Statistics

JOURNAL ARTICLE
27 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

SHARE
Vol.49 • No. 4 • August 2021
Back to Top