This paper addresses the case where data come as point sets, or more generally as measures. Our goal is to build from data an embedding of these measures into a finite-dimensional Euclidean space, that allows for provably efficient clustering of the source measures.
The vectorization technique we propose relies on finding a compactly supported approximation of the mean measure generating process, that coincides with the intensity measure in the point process framework. To this aim we provide two algorithms that we prove almost minimax optimal.
We assess the practical validity of our approach, first by showing that our results apply in the framework of persistence-based shape classification via the ATOL procedure described in . At last, numerical experiments are carried out on simulated and real datasets, encompassing text classification and large-scale graph classification.
"Clustering of measures via mean measure quantization." Electron. J. Statist. 15 (1) 2060 - 2104, 2021. https://doi.org/10.1214/21-EJS1834