August 2021 Robust k-means clustering for distributions with two moments
Yegor Klochkov, Alexey Kroshnin, Nikita Zhivotovskiy
Author Affiliations +
Ann. Statist. 49(4): 2206-2230 (August 2021). DOI: 10.1214/20-AOS2033

Abstract

We consider the robust algorithms for the k-means clustering problem where a quantizer is constructed based on N independent observations. Our main results are median of means based nonasymptotic excess distortion bounds that hold under the two bounded moments assumption in a general separable Hilbert space. In particular, our results extend the renowned asymptotic result of (Ann. Statist. 9 (1981) 135–140) who showed that the existence of two moments is sufficient for strong consistency of an empirically optimal quantizer in Rd. In a special case of clustering in Rd, under two bounded moments, we prove matching (up to constant factors) nonasymptotic upper and lower bounds on the excess distortion, which depend on the probability mass of the lightest cluster of an optimal quantizer. Our bounds have the sub-Gaussian form, and the proofs are based on the versions of uniform bounds for robust mean estimators.

Funding Statement

The work of Alexey Kroshnin was conducted within the framework of the HSE University Basic Research Program. Results of Section 4 have been obtained under support of the RSF Grant No. 19-71-30020.

Acknowledgments

We would like to thank Olivier Bachem for stimulating discussions, Gábor Lugosi for a valuable feedback and Marco Cuturi and Nikita Puchkin for providing several important references. We are also thankful to the three anonymous referees for their useful comments and suggestions.

Citation

Download Citation

Yegor Klochkov. Alexey Kroshnin. Nikita Zhivotovskiy. "Robust k-means clustering for distributions with two moments." Ann. Statist. 49 (4) 2206 - 2230, August 2021. https://doi.org/10.1214/20-AOS2033

Information

Received: 1 February 2020; Revised: 1 October 2020; Published: August 2021
First available in Project Euclid: 29 September 2021

MathSciNet: MR4319247
zbMATH: 1487.62070
Digital Object Identifier: 10.1214/20-AOS2033

Subjects:
Primary: 62F35
Secondary: 62F12

Keywords: clustering , excess distortion bounds , K-means , robust estimation

Rights: Copyright © 2021 Institute of Mathematical Statistics

JOURNAL ARTICLE
25 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.49 • No. 4 • August 2021
Back to Top