Open Access
2025 Topological K-means clustering in reproducing kernel Hilbert spaces
Matthew Dixon, Yuzhou Chen, Yulia R. Gel
Author Affiliations +
Electron. J. Statist. 19(1): 204-239 (2025). DOI: 10.1214/24-EJS2329

Abstract

We propose a new topological clustering methodology, based on generalizing an empirical risk minimization framework, using a reproducing kernel Hilbert space (RKHS) for vectorized persistent homology representations of point clouds. In contrast to conventional Euclidean-based clustering methods which address only pairwise similarity among data points, our new approach of topological K-means clusters data based on similarity of shapes which are exhibited by the local vicinity of each data point at multiple scales. Thereby, topological clustering systematically captures the inherent local and global higher order data characteristics that are otherwise inaccessible with Euclidean-based clustering. We summarize the extracted shape characteristics of each local vicinity in the form of a persistence diagram (PD) and embed the PDs into a RKHS, which induces a distance among shapes of local vicinities in Hilbert space. Our derived theoretical guarantees on stability and consistency of the topological partitions are the first theoretical results of this kind at the intersection of topological data analysis and statistical inference. Additionally, we establish a number of new theoretical results on bounds of covering numbers in Hilbert spaces which are of independent interest in statistical learning theory. We demonstrate the superior performance of the new topological K-means clustering on simulations and the US COVID-19 data.

Funding Statement

This work was supported by the NSF grant TIP-2333703 and the ONR grant N00014-21-1-2530. Also, the paper is based upon work supported by (while Y.R.G. was serving at) the NSF. The views expressed in the article do not necessarily represent the views of NSF or ONR.

Acknowledgments

The authors are very grateful to the reviewer and Associate Editor for many constructive suggestions and thought provoking questions. The authors also are very thankful to Dr. Zagvozdkin for his numerous suggestions on the manuscript and careful proofreading.

Citation

Download Citation

Matthew Dixon. Yuzhou Chen. Yulia R. Gel. "Topological K-means clustering in reproducing kernel Hilbert spaces." Electron. J. Statist. 19 (1) 204 - 239, 2025. https://doi.org/10.1214/24-EJS2329

Information

Received: 1 September 2023; Published: 2025
First available in Project Euclid: 13 January 2025

Digital Object Identifier: 10.1214/24-EJS2329

Subjects:
Primary: 60K35 , 60K35
Secondary: 60K35

Keywords: clustering stability , k-means clustering , reproducing kernel Hilbert spaces , topological data analysis

Vol.19 • No. 1 • 2025
Back to Top