The Annals of Statistics

Fully adaptive density-based clustering

Ingo Steinwart

Full-text: Open access


The clusters of a distribution are often defined by the connected components of a density level set. However, this definition depends on the user-specified level. We address this issue by proposing a simple, generic algorithm, which uses an almost arbitrary level set estimator to estimate the smallest level at which there are more than one connected components. In the case where this algorithm is fed with histogram-based level set estimates, we provide a finite sample analysis, which is then used to show that the algorithm consistently estimates both the smallest level and the corresponding connected components. We further establish rates of convergence for the two estimation problems, and last but not least, we present a simple, yet adaptive strategy for determining the width-parameter of the involved density estimator in a data-depending way.

Article information

Ann. Statist., Volume 43, Number 5 (2015), 2132-2167.

Received: March 2015
Revised: March 2015
First available in Project Euclid: 16 September 2015

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20] 91C20: Clustering [See also 62H30]
Secondary: 62G07: Density estimation

Cluster analysis consistency rates adaptivity


Steinwart, Ingo. Fully adaptive density-based clustering. Ann. Statist. 43 (2015), no. 5, 2132--2167. doi:10.1214/15-AOS1331.

Export citation


  • [1] Baíllo, A., Cuesta-Albertos, J. A. and Cuevas, A. (2001). Convergence rates in nonparametric estimation of level sets. Statist. Probab. Lett. 53 27–35.
  • [2] Baíllo, A., Cuevas, A. and Justel, A. (2000). Set estimation and nonparametric detection. Canad. J. Statist. 28 765–782.
  • [3] Ben-David, S. and Lindenbaum, M. (1997). Learning distributions by their density levels: A paradigm for learning without a teacher. J. Comput. System Sci. 55 171–182.
  • [4] Chaón, J. C. (2014). A population background for nonparametric density-based clustering. Technical report. Available at arXiv:1408.1381.
  • [5] Chaudhuri, K. and Dasgupta, S. (2010). Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems 23 (J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds.) 343–351.
  • [6] Cuevas, A. and Fraiman, R. (1997). A plug-in approach to support estimation. Ann. Statist. 25 2300–2312.
  • [7] Devroye, L. and Wise, G. L. (1980). Detection of abnormal behavior via nonparametric estimation of the support. SIAM J. Appl. Math. 38 480–488.
  • [8] Donoho, D. L. (1988). One-sided inference about functionals of a density. Ann. Statist. 16 1390–1420.
  • [9] Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. Henri Poincaré Probab. Stat. 38 907–921.
  • [10] Hartigan, J. A. (1975). Clustering Algorithms. Wiley, New York.
  • [11] Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76 388–394.
  • [12] Hartigan, J. A. (1987). Estimation of a convex density contour in two dimensions. J. Amer. Statist. Assoc. 82 267–270.
  • [13] Kpotufe, S. and von Luxburg, U. (2011). Pruning nearest neighbor cluster trees. In Proceedings of the 28th International Conference on Machine Learning (L. Getoor and T. Scheffer, eds.) 225–232. ACM, New York.
  • [14] Maier, M., Hein, M. and von Luxburg, U. (2009). Optimal construction of $k$-nearest-neighbor graphs for identifying noisy clusters. Theoret. Comput. Sci. 410 1749–1764.
  • [15] Müller, D. W. and Sawitzki, G. (1991). Excess mass estimates and tests for multimodality. J. Amer. Statist. Assoc. 86 738–746.
  • [16] Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters—An excess mass approach. Ann. Statist. 23 855–881.
  • [17] Rigollet, P. (2007). Generalized error bounds in semi-supervised classification under the cluster assumption. J. Mach. Learn. Res. 8 1369–1392.
  • [18] Rigollet, P. and Vert, R. (2009). Optimal rates for plug-in estimators of density level sets. Bernoulli 15 1154–1178.
  • [19] Rinaldo, A., Singh, A., Nugent, R. and Wasserman, L. (2012). Stability of density-based clustering. J. Mach. Learn. Res. 13 905–948.
  • [20] Rinaldo, A. and Wasserman, L. (2010). Generalized density clustering. Ann. Statist. 38 2678–2722.
  • [21] Scovel, C., Hush, D. and Steinwart, I. (2005). Learning rates for density level detection. Anal. Appl. (Singap.) 3 357–371.
  • [22] Singh, A., Scott, C. and Nowak, R. (2009). Adaptive Hausdorff estimation of density level sets. Ann. Statist. 37 2760–2782.
  • [23] Sriperumbudur, B. K. and Steinwart, I. (2012). Consistency and rates for clustering with DBSCAN. In Proceedings of the 15th International Conference on Artificial Intelligence and Statistics 2012 (N. Lawrence and M. Girolami, eds.). JMLR Workshop and Conference Proceedings 22 1090–1098.
  • [24] Steinwart, I. (2011). Adaptive density level set clustering. In Proceedings of the 24th Conference on Learning Theory 2011 (S. Kakade and U. von Luxburg, eds.). JMLR Workshop and Conference Proceedings 19 703–738. JMRL.
  • [25] Steinwart, I. (2015). Supplement to “Fully adaptive density-based clustering.” DOI:10.1214/15-AOS1331SUPP.
  • [26] Steinwart, I., Hush, D. and Scovel, C. (2005). A classification framework for anomaly detection. J. Mach. Learn. Res. 6 211–232.
  • [27] Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification 20 25–47.
  • [28] Stuetzle, W. and Nugent, R. (2010). A generalized single linkage method for estimating the cluster tree of a density. J. Comput. Graph. Statist. 19 397–418.
  • [29] Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist. 25 948–969.
  • [30] Walther, G. (1997). Granulometric smoothing. Ann. Statist. 25 2273–2299.

Supplemental materials

  • Supplement to “Fully adaptive density-based clustering”. We provide two appendices A and B. In Appendix A, several auxiliary results, which are partially taken from [24], are presented, and the assumptions made in the paper are discussed in more detail. In Appendix B, we present a couple of two-dimensional examples that show that the assumptions imposed in the paper are not only met by many discontinuous densities, but also by many continuous densities.