Open Access
August 2019 Generalized cluster trees and singular measures
Yen-Chi Chen
Ann. Statist. 47(4): 2174-2203 (August 2019). DOI: 10.1214/18-AOS1744
Abstract

In this paper we study the $\alpha $-cluster tree ($\alpha $-tree) under both singular and nonsingular measures. The $\alpha $-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an $\alpha $-tree under nonsingular measures. For singular measures, we study how the kernel density estimator (KDE) behaves and prove that the KDE is not uniformly consistent but pointwise consistent after rescaling. We further prove that the estimated $\alpha $-tree fails to converge in the $L_{\infty }$ metric but is still consistent under the integrated distance. We also observe a new type of critical points—the dimensional critical points (DCPs)—of a singular measure. DCPs are points that contribute to cluster tree topology but cannot be defined using density gradient. Building on the analysis of the KDE and DCPs, we prove the topological consistency of an estimated $\alpha $-tree.

References

1.

Balakrishnan, S., Narayanan, S., Rinaldo, A., Singh, A. and Wasserman, L. (2012). Cluster trees on manifolds. In Advances in Neural Information Processing Systems (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 26 2679–2687. Curran Associates, Red Hook, NY.Balakrishnan, S., Narayanan, S., Rinaldo, A., Singh, A. and Wasserman, L. (2012). Cluster trees on manifolds. In Advances in Neural Information Processing Systems (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 26 2679–2687. Curran Associates, Red Hook, NY.

2.

Baryshnikov, Y., Bubenik, P. and Kahle, M. (2014). Min-type Morse theory for configuration spaces of hard spheres. Int. Math. Res. Not. IMRN 9 2577–2592. 1315.55011 10.1093/imrn/rnt012Baryshnikov, Y., Bubenik, P. and Kahle, M. (2014). Min-type Morse theory for configuration spaces of hard spheres. Int. Math. Res. Not. IMRN 9 2577–2592. 1315.55011 10.1093/imrn/rnt012

3.

Bobrowski, O., Mukherjee, S. and Taylor, J. E. (2017). Topological consistency via kernel estimation. Bernoulli 23 288–328. 1395.62073 10.3150/15-BEJ744 euclid.bj/1475001356Bobrowski, O., Mukherjee, S. and Taylor, J. E. (2017). Topological consistency via kernel estimation. Bernoulli 23 288–328. 1395.62073 10.3150/15-BEJ744 euclid.bj/1475001356

4.

Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16 77–102. 1337.68221Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16 77–102. 1337.68221

5.

Cadre, B. (2006). Kernel estimation of density level sets. J. Multivariate Anal. 97 999–1023. 1085.62039 10.1016/j.jmva.2005.05.004Cadre, B. (2006). Kernel estimation of density level sets. J. Multivariate Anal. 97 999–1023. 1085.62039 10.1016/j.jmva.2005.05.004

6.

Cadre, B., Pelletier, B. and Pudlo, P. (2009). Clustering by estimation of density level sets at a fixed probability. Available at  https://hal.archives-ouvertes.fr/file/index/docid/397437/filename/tlevel.pdf1297.62070 10.1080/10485252.2012.750319Cadre, B., Pelletier, B. and Pudlo, P. (2009). Clustering by estimation of density level sets at a fixed probability. Available at  https://hal.archives-ouvertes.fr/file/index/docid/397437/filename/tlevel.pdf1297.62070 10.1080/10485252.2012.750319

7.

Carlsson, G. (2009). Topology and data. Bull. Amer. Math. Soc. (N.S.) 46 255–308. 1172.62002 10.1090/S0273-0979-09-01249-XCarlsson, G. (2009). Topology and data. Bull. Amer. Math. Soc. (N.S.) 46 255–308. 1172.62002 10.1090/S0273-0979-09-01249-X

8.

Chaudhuri, K. and Dasgupta, S. (2010). Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems 343–351.Chaudhuri, K. and Dasgupta, S. (2010). Rates of convergence for the cluster tree. In Advances in Neural Information Processing Systems 343–351.

9.

Chaudhuri, K., Dasgupta, S., Kpotufe, S. and von Luxburg, U. (2014). Consistent procedures for cluster tree estimation and pruning. IEEE Trans. Inform. Theory 60 7900–7912. 1359.62234 10.1109/TIT.2014.2361055Chaudhuri, K., Dasgupta, S., Kpotufe, S. and von Luxburg, U. (2014). Consistent procedures for cluster tree estimation and pruning. IEEE Trans. Inform. Theory 60 7900–7912. 1359.62234 10.1109/TIT.2014.2361055

10.

Chazal, F., Fasy, B., Lecci, F., Michel, B., Rinaldo, A. and Wasserman, L. (2017). Robust topological inference: Distance to a measure and kernel distance. J. Mach. Learn. Res. 18 Paper No. 159, 40. 06982915Chazal, F., Fasy, B., Lecci, F., Michel, B., Rinaldo, A. and Wasserman, L. (2017). Robust topological inference: Distance to a measure and kernel distance. J. Mach. Learn. Res. 18 Paper No. 159, 40. 06982915

11.

Chen, Y.-C. (2017). A tutorial on kernel density estimation and recent advances. ArXiv preprint. Available at  arXiv:1704.039241704.03924Chen, Y.-C. (2017). A tutorial on kernel density estimation and recent advances. ArXiv preprint. Available at  arXiv:1704.039241704.03924

12.

Chen, Y.-C. (2019). Supplement to “Generalized cluster trees and singular measures”.  DOI:10.1214/18-AOS1744SUPP.Chen, Y.-C. (2019). Supplement to “Generalized cluster trees and singular measures”.  DOI:10.1214/18-AOS1744SUPP.

13.

Chen, Y.-C. and Dobra, A. (2017). Measuring human activity spaces with density ranking based on GPS data. ArXiv preprint. Available at  arXiv:1708.050171708.05017Chen, Y.-C. and Dobra, A. (2017). Measuring human activity spaces with density ranking based on GPS data. ArXiv preprint. Available at  arXiv:1708.050171708.05017

14.

Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2015). Asymptotic theory for density ridges. Ann. Statist. 43 1896–1928. 1327.62303 10.1214/15-AOS1329 euclid.aos/1438606848Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2015). Asymptotic theory for density ridges. Ann. Statist. 43 1896–1928. 1327.62303 10.1214/15-AOS1329 euclid.aos/1438606848

15.

Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2016). A comprehensive approach to mode clustering. Electron. J. Stat. 10 210–241. 1332.62200 10.1214/15-EJS1102Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2016). A comprehensive approach to mode clustering. Electron. J. Stat. 10 210–241. 1332.62200 10.1214/15-EJS1102

16.

Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2017). Density level sets: Asymptotics, inference, and visualization. J. Amer. Statist. Assoc. 112 1684–1696.Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2017). Density level sets: Asymptotics, inference, and visualization. J. Amer. Statist. Assoc. 112 1684–1696.

17.

Chen, Y.-C., Kim, J., Balakrishnan, S., Rinaldo, A. and Wasserman, L. (2016). Statistical inference for cluster trees. ArXiv preprint. Available at  arXiv:1605.064161605.06416Chen, Y.-C., Kim, J., Balakrishnan, S., Rinaldo, A. and Wasserman, L. (2016). Statistical inference for cluster trees. ArXiv preprint. Available at  arXiv:1605.064161605.06416

18.

Cohen-Steiner, D., Edelsbrunner, H. and Harer, J. (2007). Stability of persistence diagrams. Discrete Comput. Geom. 37 103–120. 1117.54027 10.1007/s00454-006-1276-5Cohen-Steiner, D., Edelsbrunner, H. and Harer, J. (2007). Stability of persistence diagrams. Discrete Comput. Geom. 37 103–120. 1117.54027 10.1007/s00454-006-1276-5

19.

Edelsbrunner, H. and Harer, J. (2008). Persistent homology—A survey. In Surveys on Discrete and Computational Geometry. Contemp. Math. 453 257–282. Amer. Math. Soc., Providence, RI. 1145.55007Edelsbrunner, H. and Harer, J. (2008). Persistent homology—A survey. In Surveys on Discrete and Computational Geometry. Contemp. Math. 453 257–282. Amer. Math. Soc., Providence, RI. 1145.55007

20.

Edelsbrunner, H. and Morozov, D. (2013). Persistent homology: Theory and practice. In European Congress of Mathematics 31–50. Eur. Math. Soc., Zürich. 1364.55008Edelsbrunner, H. and Morozov, D. (2013). Persistent homology: Theory and practice. In European Congress of Mathematics 31–50. Eur. Math. Soc., Zürich. 1364.55008

21.

Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist. 33 1380–1403. 1079.62040 10.1214/009053605000000129 euclid.aos/1120224106Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist. 33 1380–1403. 1079.62040 10.1214/009053605000000129 euclid.aos/1120224106

22.

Eldridge, J., Belkin, M. and Wang, Y. (2015). Beyond hartigan consistency: Merge distortion metric for hierarchical clustering. In Proceedings of the 28th Conference on Learning Theory 588–606.Eldridge, J., Belkin, M. and Wang, Y. (2015). Beyond hartigan consistency: Merge distortion metric for hierarchical clustering. In Proceedings of the 28th Conference on Learning Theory 588–606.

23.

Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S. and Singh, A. (2014). Confidence sets for persistence diagrams. Ann. Statist. 42 2301–2339. 1310.62059 10.1214/14-AOS1252 euclid.aos/1413810729Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S. and Singh, A. (2014). Confidence sets for persistence diagrams. Ann. Statist. 42 2301–2339. 1310.62059 10.1214/14-AOS1252 euclid.aos/1413810729

24.

Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc. 93 418–491. 0089.38402 10.1090/S0002-9947-1959-0110078-1Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc. 93 418–491. 0089.38402 10.1090/S0002-9947-1959-0110078-1

25.

Friedman, G. (2014). Singular intersection homology. Preprint.Friedman, G. (2014). Singular intersection homology. Preprint.

26.

Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2009). On the path density of a gradient field. Ann. Statist. 37 3236–3271. 1191.62062 10.1214/08-AOS671 euclid.aos/1250515386Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2009). On the path density of a gradient field. Ann. Statist. 37 3236–3271. 1191.62062 10.1214/08-AOS671 euclid.aos/1250515386

27.

Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2014). Nonparametric ridge estimation. Ann. Statist. 42 1511–1545. 1310.62045 10.1214/14-AOS1218 euclid.aos/1407420007Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2014). Nonparametric ridge estimation. Ann. Statist. 42 1511–1545. 1310.62045 10.1214/14-AOS1218 euclid.aos/1407420007

28.

Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. Henri Poincaré B, Probab. Stat. 38 907–921. 1011.62034 10.1016/S0246-0203(02)01128-7Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. Henri Poincaré B, Probab. Stat. 38 907–921. 1011.62034 10.1016/S0246-0203(02)01128-7

29.

Goresky, M. and MacPherson, R. (1980). Intersection homology theory. Topology 19 135–162. 0448.55004 10.1016/0040-9383(80)90003-8Goresky, M. and MacPherson, R. (1980). Intersection homology theory. Topology 19 135–162. 0448.55004 10.1016/0040-9383(80)90003-8

30.

Goresky, M. and MacPherson, R. (1988). Stratified Morse Theory. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 14. Springer, Berlin. 0639.14012Goresky, M. and MacPherson, R. (1988). Stratified Morse Theory. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 14. Springer, Berlin. 0639.14012

31.

Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76 388–394. 0468.62053 10.1080/01621459.1981.10477658Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76 388–394. 0468.62053 10.1080/01621459.1981.10477658

32.

Kent, B. P. (2013). Level set trees for applied statistics. Ph.D. thesis, Carnegie Mellon Univ., Pittsburgh, PA.Kent, B. P. (2013). Level set trees for applied statistics. Ph.D. thesis, Carnegie Mellon Univ., Pittsburgh, PA.

33.

Klemelä, J. (2004). Visualization of multivariate density estimates with level set trees. J. Comput. Graph. Statist. 13 599–620.Klemelä, J. (2004). Visualization of multivariate density estimates with level set trees. J. Comput. Graph. Statist. 13 599–620.

34.

Klemelä, J. (2006). Visualization of multivariate density estimates with shape trees. J. Comput. Graph. Statist. 15 372–397.Klemelä, J. (2006). Visualization of multivariate density estimates with shape trees. J. Comput. Graph. Statist. 15 372–397.

35.

Klemelä, J. (2009). Smoothing of Multivariate Data: Density Estimation and Visualization. Wiley, Hoboken, NJ.Klemelä, J. (2009). Smoothing of Multivariate Data: Density Estimation and Visualization. Wiley, Hoboken, NJ.

36.

Kpotufe, S. and Luxburg, U. V. (2011). Pruning nearest neighbor cluster trees. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (L. Getoor and T. Scheffer, eds.) 225–232. International Machine Learning Society, Madison, WI.Kpotufe, S. and Luxburg, U. V. (2011). Pruning nearest neighbor cluster trees. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (L. Getoor and T. Scheffer, eds.) 225–232. International Machine Learning Society, Madison, WI.

37.

Laloe, T. and Servien, R. (2013). Nonparametric estimation of regression level sets. J. Korean Statist. Soc. 1294.62064 10.1016/j.jkss.2012.10.001Laloe, T. and Servien, R. (2013). Nonparametric estimation of regression level sets. J. Korean Statist. Soc. 1294.62064 10.1016/j.jkss.2012.10.001

38.

Lee, J. M. (2013). Introduction to Smooth Manifolds, 2nd ed. Graduate Texts in Mathematics 218. Springer, New York. 1258.53002Lee, J. M. (2013). Introduction to Smooth Manifolds, 2nd ed. Graduate Texts in Mathematics 218. Springer, New York. 1258.53002

39.

Mammen, E. and Polonik, W. (2013). Confidence regions for level sets. J. Multivariate Anal. 122 202–214. 1280.62056 10.1016/j.jmva.2013.07.017Mammen, E. and Polonik, W. (2013). Confidence regions for level sets. J. Multivariate Anal. 122 202–214. 1280.62056 10.1016/j.jmva.2013.07.017

40.

Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates. Ann. Appl. Probab. 19 1108–1142. 1180.62048 10.1214/08-AAP569 euclid.aoap/1245071021Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates. Ann. Appl. Probab. 19 1108–1142. 1180.62048 10.1214/08-AAP569 euclid.aoap/1245071021

41.

Mattila, P. (1995). Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability. Cambridge Studies in Advanced Mathematics 44. Cambridge Univ. Press, Cambridge. 0819.28004Mattila, P. (1995). Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability. Cambridge Studies in Advanced Mathematics 44. Cambridge Univ. Press, Cambridge. 0819.28004

42.

Milnor, J. (1963). Morse Theory. Based on Lecture Notes by M. Spivak and R. Wells. Annals of Mathematics Studies, No. 51. Princeton Univ. Press, Princeton, NJ.Milnor, J. (1963). Morse Theory. Based on Lecture Notes by M. Spivak and R. Wells. Annals of Mathematics Studies, No. 51. Princeton Univ. Press, Princeton, NJ.

43.

Molchanov, I. S. (1990). Empirical estimation of quantiles of distributions of random closed sets. Teor. Veroyatn. Primen. 35 586–592. 0727.62041Molchanov, I. S. (1990). Empirical estimation of quantiles of distributions of random closed sets. Teor. Veroyatn. Primen. 35 586–592. 0727.62041

44.

Morse, M. (1925). Relations between the critical points of a real function of $n$ independent variables. Trans. Amer. Math. Soc. 27 345–396. 51.0451.01Morse, M. (1925). Relations between the critical points of a real function of $n$ independent variables. Trans. Amer. Math. Soc. 27 345–396. 51.0451.01

45.

Morse, M. (1930). The foundations of a theory of the calculus of variations in the large in $m$-space. II. Trans. Amer. Math. Soc. 32 599–631. 56.1079.01Morse, M. (1930). The foundations of a theory of the calculus of variations in the large in $m$-space. II. Trans. Amer. Math. Soc. 32 599–631. 56.1079.01

46.

Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters—An excess mass approach. Ann. Statist. 23 855–881. 0841.62045 10.1214/aos/1176324626 euclid.aos/1176324626Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters—An excess mass approach. Ann. Statist. 23 855–881. 0841.62045 10.1214/aos/1176324626 euclid.aos/1176324626

47.

Preiss, D. (1987). Geometry of measures in $\textbf{R}^{n}$: Distribution, rectifiability, and densities. Ann. of Math. (2) 125 537–643.Preiss, D. (1987). Geometry of measures in $\textbf{R}^{n}$: Distribution, rectifiability, and densities. Ann. of Math. (2) 125 537–643.

48.

Rinaldo, A. and Wasserman, L. (2010). Generalized density clustering. Ann. Statist. 38 2678–2722. 1200.62066 10.1214/10-AOS797 euclid.aos/1278861457Rinaldo, A. and Wasserman, L. (2010). Generalized density clustering. Ann. Statist. 38 2678–2722. 1200.62066 10.1214/10-AOS797 euclid.aos/1278861457

49.

Rinaldo, A., Singh, A., Nugent, R. and Wasserman, L. (2012). Stability of density-based clustering. J. Mach. Learn. Res. 13 905–948. 1283.62130Rinaldo, A., Singh, A., Nugent, R. and Wasserman, L. (2012). Stability of density-based clustering. J. Mach. Learn. Res. 13 905–948. 1283.62130

50.

Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed. Wiley, Hoboken, NJ. 1311.62004Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed. Wiley, Hoboken, NJ. 1311.62004

51.

Singh, A., Scott, C. and Nowak, R. (2009). Adaptive Hausdorff estimation of density level sets. Ann. Statist. 37 2760–2782. 1173.62019 10.1214/08-AOS661 euclid.aos/1247836668Singh, A., Scott, C. and Nowak, R. (2009). Adaptive Hausdorff estimation of density level sets. Ann. Statist. 37 2760–2782. 1173.62019 10.1214/08-AOS661 euclid.aos/1247836668

52.

Steinwart, I. (2011). Adaptive density level set clustering. In COLT 703–738.Steinwart, I. (2011). Adaptive density level set clustering. In COLT 703–738.

53.

Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification 20 25–47. 1055.62075 10.1007/s00357-003-0004-6Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification 20 25–47. 1055.62075 10.1007/s00357-003-0004-6

54.

Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist. 25 948–969. 0881.62039 10.1214/aos/1069362732 euclid.aos/1069362732Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist. 25 948–969. 0881.62039 10.1214/aos/1069362732 euclid.aos/1069362732

55.

Tu, L. W. (2008). An Introduction to Manifolds. Springer, New York. 1144.58001Tu, L. W. (2008). An Introduction to Manifolds. Springer, New York. 1144.58001

56.

Walther, G. (1997). Granulometric smoothing. Ann. Statist. 25 2273–2299. MR1604445 0919.62026 10.1214/aos/1069362379 euclid.aos/1030741072Walther, G. (1997). Granulometric smoothing. Ann. Statist. 25 2273–2299. MR1604445 0919.62026 10.1214/aos/1069362379 euclid.aos/1030741072

57.

Wasserman, L. (2006). All of Nonparametric Statistics. Springer Texts in Statistics. Springer, New York. 1099.62029Wasserman, L. (2006). All of Nonparametric Statistics. Springer Texts in Statistics. Springer, New York. 1099.62029

58.

Wasserman, L. (2018). Topological data analysis. Ann. Rev. Stat. Appl. 5 501–535.Wasserman, L. (2018). Topological data analysis. Ann. Rev. Stat. Appl. 5 501–535.
Copyright © 2019 Institute of Mathematical Statistics
Yen-Chi Chen "Generalized cluster trees and singular measures," The Annals of Statistics 47(4), 2174-2203, (August 2019). https://doi.org/10.1214/18-AOS1744
Received: 1 November 2016; Published: August 2019
Vol.47 • No. 4 • August 2019
Back to Top