Open Access
2021 Clustering of measures via mean measure quantization
Frédéric Chazal, Clément Levrard, Martin Royer
Author Affiliations +
Electron. J. Statist. 15(1): 2060-2104 (2021). DOI: 10.1214/21-EJS1834

Abstract

This paper addresses the case where data come as point sets, or more generally as measures. Our goal is to build from data an embedding of these measures into a finite-dimensional Euclidean space, that allows for provably efficient clustering of the source measures.

The vectorization technique we propose relies on finding a compactly supported approximation of the mean measure generating process, that coincides with the intensity measure in the point process framework. To this aim we provide two algorithms that we prove almost minimax optimal.

We assess the practical validity of our approach, first by showing that our results apply in the framework of persistence-based shape classification via the ATOL procedure described in [34]. At last, numerical experiments are carried out on simulated and real datasets, encompassing text classification and large-scale graph classification.

Citation

Download Citation

Frédéric Chazal. Clément Levrard. Martin Royer. "Clustering of measures via mean measure quantization." Electron. J. Statist. 15 (1) 2060 - 2104, 2021. https://doi.org/10.1214/21-EJS1834

Information

Received: 1 November 2020; Published: 2021
First available in Project Euclid: 7 April 2021

Digital Object Identifier: 10.1214/21-EJS1834

Subjects:
Primary: 62H30

Keywords: clustering , quantization , topological data analysis

Vol.15 • No. 1 • 2021
Back to Top