Electronic Journal of Statistics

Analysis of a mode clustering diagram

Isabella Verdinelli and Larry Wasserman

Full-text: Open access

Abstract

Mode-based clustering methods define clusters in terms of the modes of a density estimate. The most common mode-based method is mean shift clustering which defines clusters to be the basins of attraction of the modes. Specifically, the gradient of the density defines a flow which is estimated using a gradient ascent algorithm. Rodriguez and Laio (2014) introduced a new method that is faster and simpler than mean shift clustering. Furthermore, they define a clustering diagram that provides a simple, two-dimensional summary of the clustering information. We study the statistical properties of this diagram and we propose some improvements and extensions. In particular, we show a connection between the diagram and robust linear regression.

Article information

Source
Electron. J. Statist., Volume 12, Number 2 (2018), 4288-4312.

Dates
Received: May 2018
First available in Project Euclid: 18 December 2018

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1545123625

Digital Object Identifier
doi:10.1214/18-EJS1510

Subjects
Primary: 62H30: Classification and discrimination; cluster analysis [See also 68T10, 91C20]
Secondary: 62H86: Multivariate analysis and fuzziness

Keywords
Modes clustering mean-shift

Rights
Creative Commons Attribution 4.0 International License.

Citation

Verdinelli, Isabella; Wasserman, Larry. Analysis of a mode clustering diagram. Electron. J. Statist. 12 (2018), no. 2, 4288--4312. doi:10.1214/18-EJS1510. https://projecteuclid.org/euclid.ejs/1545123625


Export citation

References

  • Ery Arias-Castro, David Mason, and Bruno Pelletier. On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm., Journal of Machine Learning Research, 2015.
  • Chacón. Clusters and water flows: a novel approach to modal clustering through morse theory., arXiv preprint arXiv :1212.1384, 2012.
  • José E Chacón, Tarn Duong, et al. Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting., Electronic Journal of Statistics, 7:499–532, 2013.
  • José E Chacón et al. A population background for nonparametric density-based clustering., Statistical Science, 30(4):518–532, 2015.
  • Frédéric Chazal, Brittany T Fasy, Fabrizio Lecci, Bertrand Michel, Alessandro Rinaldo, and Larry Wasserman. Robust topological inference: Distance to a measure and kernel distance., To Appear: Journal of Machine Learning Research, 2017.
  • Yizong Cheng. Mean shift, mode seeking, and clustering., Pattern Analysis and Machine Intelligence, IEEE Transactions on, 17(8):790–799, 1995.
  • Dorin Comaniciu and Peter Meer. Mean shift: A robust approach toward feature space analysis., Pattern Analysis and Machine Intelligence, IEEE Transactions on, 24(5):603–619, 2002.
  • Vincent Courjault-Radé, Ludovic D’Estampes, and Stéphane Puechmorel. Improved density peak clustering for large datasets., 2016.
  • Mingjing Du, Shifei Ding, and Hongjie Jia. Study on density peaks clustering based on k-nearest neighbors and principal component analysis., Knowledge-Based Systems, 99:135–145, 2016.
  • Christopher R Genovese, Marco Perone-Pacifico, Isabella Verdinelli, and Larry Wasserman. Non-parametric inference for density modes., Journal of the Royal Statistical Society: Series B (Statistical Methodology), 78(1):99–126, 2016.
  • Heinrich Jiang and Samory Kpotufe. Modal-set estimation with an application to clustering., arXiv preprint arXiv :1606.04166, 2016.
  • Jia Li, Surajit Ray, and Bruce G Lindsay. A nonparametric statistical approach to clustering via mode identification., Journal of Machine Learning Research, 8(Aug) :1687–1723, 2007.
  • John Milnor., Morse Theory.(AM-51), volume 51. Princeton university press, 2016.
  • Alex Rodriguez and Alessandro Laio. Clustering by fast search and find of density peaks., Science, 344 (6191):1492–1496, 2014.
  • Xiao-Feng Wang and Yifan Xu. Fast clustering using adaptive density peak detection., Statistical methods in medical research, pages 2800–2811, 2017.