Open Access
May 2020 Reliable clustering of Bernoulli mixture models
Amir Najafi, Seyed Abolfazl Motahari, Hamid R. Rabiee
Bernoulli 26(2): 1535-1559 (May 2020). DOI: 10.3150/19-BEJ1173

Abstract

A Bernoulli Mixture Model (BMM) is a finite mixture of random binary vectors with independent dimensions. The problem of clustering BMM data arises in a variety of real-world applications, ranging from population genetics to activity analysis in social networks. In this paper, we analyze the clusterability of BMMs from a theoretical perspective, when the number of clusters is unknown. In particular, we stipulate a set of conditions on the sample complexity and dimension of the model in order to guarantee the Probably Approximately Correct (PAC)-clusterability of a dataset. To the best of our knowledge, these findings are the first non-asymptotic bounds on the sample complexity of learning or clustering BMMs.

Citation

Download Citation

Amir Najafi. Seyed Abolfazl Motahari. Hamid R. Rabiee. "Reliable clustering of Bernoulli mixture models." Bernoulli 26 (2) 1535 - 1559, May 2020. https://doi.org/10.3150/19-BEJ1173

Information

Received: 1 June 2019; Revised: 1 October 2019; Published: May 2020
First available in Project Euclid: 31 January 2020

zbMATH: 07166573
MathSciNet: MR4058377
Digital Object Identifier: 10.3150/19-BEJ1173

Keywords: High-dimensional statistics , mixture model analysis , PAC-learnability , sample complexity

Rights: Copyright © 2020 Bernoulli Society for Mathematical Statistics and Probability

Vol.26 • No. 2 • May 2020
Back to Top