The Annals of Statistics
- Ann. Statist.
- Volume 43, Number 5 (2015), 2132-2167.
Fully adaptive density-based clustering
The clusters of a distribution are often defined by the connected components of a density level set. However, this definition depends on the user-specified level. We address this issue by proposing a simple, generic algorithm, which uses an almost arbitrary level set estimator to estimate the smallest level at which there are more than one connected components. In the case where this algorithm is fed with histogram-based level set estimates, we provide a finite sample analysis, which is then used to show that the algorithm consistently estimates both the smallest level and the corresponding connected components. We further establish rates of convergence for the two estimation problems, and last but not least, we present a simple, yet adaptive strategy for determining the width-parameter of the involved density estimator in a data-depending way.
Ann. Statist., Volume 43, Number 5 (2015), 2132-2167.
Received: March 2015
Revised: March 2015
First available in Project Euclid: 16 September 2015
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Steinwart, Ingo. Fully adaptive density-based clustering. Ann. Statist. 43 (2015), no. 5, 2132--2167. doi:10.1214/15-AOS1331. https://projecteuclid.org/euclid.aos/1442364148
- Supplement to “Fully adaptive density-based clustering”. We provide two appendices A and B. In Appendix A, several auxiliary results, which are partially taken from , are presented, and the assumptions made in the paper are discussed in more detail. In Appendix B, we present a couple of two-dimensional examples that show that the assumptions imposed in the paper are not only met by many discontinuous densities, but also by many continuous densities.