## Statistical Science

### A Population Background for Nonparametric Density-Based Clustering

José E. Chacón

#### Abstract

Despite its popularity, it is widely recognized that the investigation of some theoretical aspects of clustering has been relatively sparse. One of the main reasons for this lack of theoretical results is surely the fact that, whereas for other statistical problems the theoretical population goal is clearly defined (as in regression or classification), for some of the clustering methodologies it is difficult to specify the population goal to which the data-based clustering algorithms should try to get close. This paper aims to provide some insight into the theoretical foundations of clustering by focusing on two main objectives: to provide an explicit formulation for the ideal population goal of the modal clustering methodology, which understands clusters as regions of high density; and to present two new loss functions, applicable in fact to any clustering methodology, to evaluate the performance of a data-based clustering algorithm with respect to the ideal population goal. In particular, it is shown that only mild conditions on a sequence of density estimators are needed to ensure that the sequence of modal clusterings that they induce is consistent.

#### Article information

Source
Statist. Sci., Volume 30, Number 4 (2015), 518-532.

Dates
First available in Project Euclid: 9 December 2015

https://projecteuclid.org/euclid.ss/1449670856

Digital Object Identifier
doi:10.1214/15-STS526

Mathematical Reviews number (MathSciNet)
MR3432839

Zentralblatt MATH identifier
06946200

#### Citation

Chacón, José E. A Population Background for Nonparametric Density-Based Clustering. Statist. Sci. 30 (2015), no. 4, 518--532. doi:10.1214/15-STS526. https://projecteuclid.org/euclid.ss/1449670856

#### References

• Ackerman, M. and Ben-David, S. (2009). Measures of clustering quality: A working set of axioms for clustering. In Advances in Neural Information Processing Systems 21 (D. Koller, D. Schuurmans, Y. Bengio and L. Bottou, eds.) 121–128. Curran Associates, Red Hook, NY. Available at http://papers.nips.cc/paper/3491-measures-of-clustering-quality-a-working-set-of-axioms-for-clustering.pdf.
• Agrachev, A. A., Pallaschke, D. and Scholtes, S. (1997). On Morse theory for piecewise smooth functions. J. Dyn. Control Syst. 3 449–469.
• Arabie, P. and Boorman, S. A. (1973). Multidimensional scaling of measures of distance between partitions. J. Math. Psych. 10 148–203.
• Arias-Castro, E., Mason, D. and Pelletier, B. (2013). On the estimation of the gradient lines of a density and the consistency of the mean-shift algorithm. Preprint.
• Arnold, V. I., Goryunov, V. V., Lyashko, O. V. and Vasil’ev, V. A. (1998). Singularity Theory. I. Springer, Berlin.
• Azzalini, A. and Torelli, N. (2007). Clustering via nonparametric density estimation. Stat. Comput. 17 71–80.
• Ben-David, S., von Luxburg, U. and Pál, D. (2006). A sober look at clustering stability. In Learning Theory (G. Lugosi and H.-U. Simon, eds.). Lecture Notes in Computer Science 4005 5–19. Springer, Berlin.
• Bertrand-Retali, M. (1978). Convergence uniforme d’un estimateur de la densité par la méthode du noyau. Rev. Roumaine Math. Pures Appl. 23 361–385.
• Burkard, R., Dell’Amico, M. and Martello, S. (2009). Assignment Problems. SIAM, Philadelphia, PA.
• Cadre, B., Pelletier, B. and Pudlo, P. (2013). Estimation of density level sets with a given probability content. J. Nonparametr. Stat. 25 261–272.
• Carlsson, G. (2009). Topology and data. Bull. Amer. Math. Soc. (N.S.) 46 255–308.
• Carlsson, G. and Mémoli, F. (2013). Classifying clustering schemes. Found. Comput. Math. 13 221–252.
• Chacón, J. E. (2009). Data-driven choice of the smoothing parametrization for kernel density estimators. Canad. J. Statist. 37 249–265.
• Chacón, J. E. (2012). Clusters and water flows: A novel approach to modal clustering through Morse theory. Preprint. Available at arXiv:1212.1384.
• Chacón, J. E. and Duong, T. (2013). Data-driven density derivative estimation, with applications to nonparametric clustering and bump hunting. Electron. J. Stat. 7 499–532.
• Chacón, J. E. and Monfort, P. (2014). A comparison of bandwidth selectors for mean shift clustering. In Theoretical and Applied Issues in Statistics and Demography (C. H. Skiadas, ed.) 47–59. International Society for the Advancement of Science and Technology (ISAST), Athens.
• Charon, I., Denœud, L., Guénoche, A. and Hudry, O. (2006). Maximum transfer distance between partitions. J. Classification 23 103–121.
• Chaudhuri, K. and Dasgupta, S. (2010). Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems (J. D. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. S. Zemel and A. Culotta, eds.) 23 343–351. Curran Associates, Red Hook, NY.
• Chazal, F., Guibas, L. J., Oudot, S. Y. and Skraba, P. (2013). Persistence-based clustering in Riemannian manifolds. J. ACM 60 Art. 41, 38.
• Cuevas, A., Febrero, M. and Fraiman, R. (2000). Estimating the number of clusters. Canad. J. Statist. 28 367–382.
• Cuevas, A., Febrero, M. and Fraiman, R. (2001). Cluster analysis: A further approach based on density estimation. Comput. Statist. Data Anal. 36 441–459.
• Cuevas, A. and Fraiman, R. (2010). Set estimation. In New Perspectives in Stochastic Geometry (W. Kendall and I. Molchanov, eds.) 374–397. Oxford Univ. Press, Oxford.
• Cuevas, A. and González Manteiga, W. (1991). Data-driven smoothing based on convexity properties. In Nonparametric Functional Estimation and Related Topics (Spetses, 1990) (G. Roussas, ed.). NATO Adv. Sci. Inst. Ser. C Math. Phys. Sci. 335 225–240. Kluwer Academic, Dordrecht.
• Day, W. H. E. (1980/81). The complexity of computing metric distances between partitions. Math. Social Sci. 1 269–287.
• Deheuvels, P. (1974). Conditions nécessaires et suffisantes de convergence ponctuelle presque sûre et uniforme presque sûre des estimateurs de la densité. C. R. Acad. Sci. Paris Sér. A 278 1217–1220.
• Denœud, L. (2008). Transfer distance between partitions. Adv. Data Anal. Classif. 2 279–294.
• Devroye, L., Györfi, L. and Lugosi, G. (1996). A Probabilistic Theory of Pattern Recognition. Applications of Mathematics (New York) 31. Springer, New York.
• Donoho, D. L. (1988). One-sided inference about functionals of a density. Ann. Statist. 16 1390–1420.
• Edelsbrunner, H. and Harer, J. (2008). Persistent homology—A survey. In Surveys on Discrete and Computational Geometry. Contemp. Math. 453 257–282. Amer. Math. Soc., Providence, RI.
• Einbeck, J. (2011). Bandwidth selection for mean-shift based unsupervised learning techniques: A unified approach via self-coverage. Journal of Pattern Recognition Research 6 175–192.
• Everitt, B. S., Landau, S., Lesse, M. and Stahl, D. (2011). Cluster Analysis, 5th ed. Wiley, Chichester.
• Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S. and Singh, A. (2014). Statistical inference for persistent homology: Confidence sets for persistence diagrams. Available at arXiv:1303.7117v2.
• Fraley, C. and Raftery, A. E. (2002). Model-based clustering, discriminant analysis, and density estimation. J. Amer. Statist. Assoc. 97 611–631.
• Fukunaga, K. and Hostetler, L. D. (1975). The estimation of the gradient of a density function, with applications in pattern recognition. IEEE Trans. Inform. Theory IT-21 32–40.
• Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2015). Non-parametric inference for density modes. J. R. Stat. Soc. Ser. B. Stat. Methodol. To appear. DOI:10.1111/rssb.12111.
• Graf, S. and Luschgy, H. (2000). Foundations of Quantization for Probability Distributions. Lecture Notes in Math. 1730. Springer, Berlin.
• Hand, D., Mannila, H. and Smyth, P. (2001). Principles of Data Mining. MIT Press, Cambridge, MA.
• Hartigan, J. A. (1975). Clustering Algorithms. Wiley, New York.
• Hastie, T., Tibshirani, R. and Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd ed. Springer, New York.
• Izenman, A. J. (2008). Modern Multivariate Statistical Techniques: Regression, Classification, and Manifold Learning. Springer, New York.
• Jost, J. (2011). Riemannian Geometry and Geometric Analysis, 6th ed. Universitext. Springer, Heidelberg.
• Klemelä, J. (2009). Smoothing of Multivariate Data: Density Estimation and Visualization. Wiley, Hoboken, NJ.
• Li, J., Ray, S. and Lindsay, B. G. (2007). A nonparametric statistical approach to clustering via mode identification. J. Mach. Learn. Res. 8 1687–1723.
• MacQueen, J. (1967). Some methods for classification and analysis of multivariate observations. In Proc. Fifth Berkeley Sympos. Math. Statist. and Probability (Berkeley, Calif., 1965/66) 281–297. Univ. California Press, Berkeley.
• Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates. Ann. Appl. Probab. 19 1108–1142.
• Matsumoto, Y. (2002). An Introduction to Morse Theory. Translations of Mathematical Monographs 208. Amer. Math. Soc., Providence, RI.
• Meilă, M. (2005). Comparing clusterings—an axiomatic view. In Proceedings of the International Machine Learning Conference (ICML) (S. Wrobel and L. De Raedt, eds.) 577–584. ACM Press, New York.
• Meilă, M. (2007). Comparing clusterings—an information based distance. J. Multivariate Anal. 98 873–895.
• Meilă, M. (2012). Local equivalences of distances between clusterings—a geometric perspective. Mach. Learn. 86 369–389.
• Menardi, G. and Azzalini, A. (2014). An advancement in clustering via nonparametric density estimation. Stat. Comput. 24 753–767.
• Milnor, J. (1963). Morse Theory. Princeton Univ. Press, Princeton, NJ.
• Nugent, R. and Stuetzle, W. (2010). Clustering with confidence: A low-dimensional binning approach. In Classification as a Tool for Research (H. Locarek-Junge and C. Weihs, eds.) 117–125. Springer, Berlin.
• Pollard, D. (1981). Strong consistency of $k$-means clustering. Ann. Statist. 9 135–140.
• Ray, S. and Lindsay, B. G. (2005). The topography of multivariate normal mixtures. Ann. Statist. 33 2042–2065.
• Rinaldo, A., Singh, A., Nugent, R. and Wasserman, L. (2012). Stability of density-based clustering. J. Mach. Learn. Res. 13 905–948.
• Rodríguez-Casal, A. (2003). Estimación de Conjuntos y sus Fronteras. Un Enfoque Geométrico. Ph.D. thesis, Univ. Santiago de Compostela.
• Romano, J. P. (1988). On weak convergence and optimality of kernel density estimates of the mode. Ann. Statist. 16 629–647.
• Silverman, B. W. (1978). Weak and strong uniform consistency of the kernel estimate of a density and its derivatives. Ann. Statist. 6 177–184.
• Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification 20 25–47.
• Thom, R. (1949). Sur une partition en cellules associée à une fonction sur une variété. C. R. Acad. Sci. Paris 228 973–975.
• Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist. 25 948–969.
• Vitalli, M. (2010). Morse decomposition of geometric meshes with applications. Ph.D. thesis, Università di Genova.
• von Luxburg, U. (2004). Statistical learning with similarity and dissimilarity functions. Ph.D. thesis, Technical Univ. Berlin.
• von Luxburg, U. and Ben-David, S. (2005). Towards a statistical theory for clustering. In PASCAL Workshop on Statistics and Optimization of Clustering.
• Wand, M. P. and Jones, M. C. (1993). Comparison of smoothing parameterizations in bivariate kernel density estimation. J. Amer. Statist. Assoc. 88 520–528.
• Wand, M. P. and Jones, M. C. (1995). Kernel Smoothing. Monographs on Statistics and Applied Probability 60. Chapman & Hall, London.
• Wang, X., Qiu, W. and Zamar, R. H. (2007). CLUES: A non-parametric clustering method based on local shrinking. Comput. Statist. Data Anal. 52 286–298.
• Wishart, D. (1969). Mode analysis: A generalization of nearest neighbor which reduces chaining effects. In Numerical Taxonomy (A. J. Cole, ed.) 282–311. Academic Press, New York.
• Zadeh, R. B. and Ben-David, S. (2009). A uniqueness theorem for clustering. In UAI’09 Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence 639–646.