Abstract
We propose a new topological clustering methodology, based on generalizing an empirical risk minimization framework, using a reproducing kernel Hilbert space (RKHS) for vectorized persistent homology representations of point clouds. In contrast to conventional Euclidean-based clustering methods which address only pairwise similarity among data points, our new approach of topological K-means clusters data based on similarity of shapes which are exhibited by the local vicinity of each data point at multiple scales. Thereby, topological clustering systematically captures the inherent local and global higher order data characteristics that are otherwise inaccessible with Euclidean-based clustering. We summarize the extracted shape characteristics of each local vicinity in the form of a persistence diagram (PD) and embed the PDs into a RKHS, which induces a distance among shapes of local vicinities in Hilbert space. Our derived theoretical guarantees on stability and consistency of the topological partitions are the first theoretical results of this kind at the intersection of topological data analysis and statistical inference. Additionally, we establish a number of new theoretical results on bounds of covering numbers in Hilbert spaces which are of independent interest in statistical learning theory. We demonstrate the superior performance of the new topological K-means clustering on simulations and the US COVID-19 data.
Funding Statement
This work was supported by the NSF grant TIP-2333703 and the ONR grant N00014-21-1-2530. Also, the paper is based upon work supported by (while Y.R.G. was serving at) the NSF. The views expressed in the article do not necessarily represent the views of NSF or ONR.
Acknowledgments
The authors are very grateful to the reviewer and Associate Editor for many constructive suggestions and thought provoking questions. The authors also are very thankful to Dr. Zagvozdkin for his numerous suggestions on the manuscript and careful proofreading.
Citation
Matthew Dixon. Yuzhou Chen. Yulia R. Gel. "Topological K-means clustering in reproducing kernel Hilbert spaces." Electron. J. Statist. 19 (1) 204 - 239, 2025. https://doi.org/10.1214/24-EJS2329
Information