In this paper we study the $\alpha $-cluster tree ($\alpha $-tree) under both singular and nonsingular measures. The $\alpha $-tree uses probability contents within a set created by the ordering of points to construct a cluster tree so that it is well defined even for singular measures. We first derive the convergence rate for a density level set around critical points, which leads to the convergence rate for estimating an $\alpha $-tree under nonsingular measures. For singular measures, we study how the kernel density estimator (KDE) behaves and prove that the KDE is not uniformly consistent but pointwise consistent after rescaling. We further prove that the estimated $\alpha $-tree fails to converge in the $L_{\infty }$ metric but is still consistent under the integrated distance. We also observe a new type of critical points—the dimensional critical points (DCPs)—of a singular measure. DCPs are points that contribute to cluster tree topology but cannot be defined using density gradient. Building on the analysis of the KDE and DCPs, we prove the topological consistency of an estimated $\alpha $-tree.
Ann. Statist.
47(4):
2174-2203
(August 2019).
DOI: 10.1214/18-AOS1744
Balakrishnan, S., Narayanan, S., Rinaldo, A., Singh, A. and Wasserman, L. (2012). Cluster trees on manifolds. In Advances in Neural Information Processing Systems (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 26 2679–2687. Curran Associates, Red Hook, NY.Balakrishnan, S., Narayanan, S., Rinaldo, A., Singh, A. and Wasserman, L. (2012). Cluster trees on manifolds. In Advances in Neural Information Processing Systems (C. J. C. Burges, L. Bottou, M. Welling, Z. Ghahramani and K. Q. Weinberger, eds.) 26 2679–2687. Curran Associates, Red Hook, NY.
Baryshnikov, Y., Bubenik, P. and Kahle, M. (2014). Min-type Morse theory for configuration spaces of hard spheres. Int. Math. Res. Not. IMRN 9 2577–2592. 1315.55011 10.1093/imrn/rnt012Baryshnikov, Y., Bubenik, P. and Kahle, M. (2014). Min-type Morse theory for configuration spaces of hard spheres. Int. Math. Res. Not. IMRN 9 2577–2592. 1315.55011 10.1093/imrn/rnt012
Bobrowski, O., Mukherjee, S. and Taylor, J. E. (2017). Topological consistency via kernel estimation. Bernoulli 23 288–328. 1395.62073 10.3150/15-BEJ744 euclid.bj/1475001356Bobrowski, O., Mukherjee, S. and Taylor, J. E. (2017). Topological consistency via kernel estimation. Bernoulli 23 288–328. 1395.62073 10.3150/15-BEJ744 euclid.bj/1475001356
Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16 77–102. 1337.68221Bubenik, P. (2015). Statistical topological data analysis using persistence landscapes. J. Mach. Learn. Res. 16 77–102. 1337.68221
Cadre, B. (2006). Kernel estimation of density level sets. J. Multivariate Anal. 97 999–1023. 1085.62039 10.1016/j.jmva.2005.05.004Cadre, B. (2006). Kernel estimation of density level sets. J. Multivariate Anal. 97 999–1023. 1085.62039 10.1016/j.jmva.2005.05.004
Cadre, B., Pelletier, B. and Pudlo, P. (2009). Clustering by estimation of density level sets at a fixed probability. Available at https://hal.archives-ouvertes.fr/file/index/docid/397437/filename/tlevel.pdf. 1297.62070 10.1080/10485252.2012.750319Cadre, B., Pelletier, B. and Pudlo, P. (2009). Clustering by estimation of density level sets at a fixed probability. Available at https://hal.archives-ouvertes.fr/file/index/docid/397437/filename/tlevel.pdf. 1297.62070 10.1080/10485252.2012.750319
Carlsson, G. (2009). Topology and data. Bull. Amer. Math. Soc. (N.S.) 46 255–308. 1172.62002 10.1090/S0273-0979-09-01249-XCarlsson, G. (2009). Topology and data. Bull. Amer. Math. Soc. (N.S.) 46 255–308. 1172.62002 10.1090/S0273-0979-09-01249-X
Chaudhuri, K., Dasgupta, S., Kpotufe, S. and von Luxburg, U. (2014). Consistent procedures for cluster tree estimation and pruning. IEEE Trans. Inform. Theory 60 7900–7912. 1359.62234 10.1109/TIT.2014.2361055Chaudhuri, K., Dasgupta, S., Kpotufe, S. and von Luxburg, U. (2014). Consistent procedures for cluster tree estimation and pruning. IEEE Trans. Inform. Theory 60 7900–7912. 1359.62234 10.1109/TIT.2014.2361055
Chazal, F., Fasy, B., Lecci, F., Michel, B., Rinaldo, A. and Wasserman, L. (2017). Robust topological inference: Distance to a measure and kernel distance. J. Mach. Learn. Res. 18 Paper No. 159, 40. 06982915Chazal, F., Fasy, B., Lecci, F., Michel, B., Rinaldo, A. and Wasserman, L. (2017). Robust topological inference: Distance to a measure and kernel distance. J. Mach. Learn. Res. 18 Paper No. 159, 40. 06982915
Chen, Y.-C. (2017). A tutorial on kernel density estimation and recent advances. ArXiv preprint. Available at arXiv:1704.03924. 1704.03924Chen, Y.-C. (2017). A tutorial on kernel density estimation and recent advances. ArXiv preprint. Available at arXiv:1704.03924. 1704.03924
Chen, Y.-C. (2019). Supplement to “Generalized cluster trees and singular measures”. DOI:10.1214/18-AOS1744SUPP.Chen, Y.-C. (2019). Supplement to “Generalized cluster trees and singular measures”. DOI:10.1214/18-AOS1744SUPP.
Chen, Y.-C. and Dobra, A. (2017). Measuring human activity spaces with density ranking based on GPS data. ArXiv preprint. Available at arXiv:1708.05017. 1708.05017Chen, Y.-C. and Dobra, A. (2017). Measuring human activity spaces with density ranking based on GPS data. ArXiv preprint. Available at arXiv:1708.05017. 1708.05017
Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2015). Asymptotic theory for density ridges. Ann. Statist. 43 1896–1928. 1327.62303 10.1214/15-AOS1329 euclid.aos/1438606848Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2015). Asymptotic theory for density ridges. Ann. Statist. 43 1896–1928. 1327.62303 10.1214/15-AOS1329 euclid.aos/1438606848
Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2016). A comprehensive approach to mode clustering. Electron. J. Stat. 10 210–241. 1332.62200 10.1214/15-EJS1102Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2016). A comprehensive approach to mode clustering. Electron. J. Stat. 10 210–241. 1332.62200 10.1214/15-EJS1102
Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2017). Density level sets: Asymptotics, inference, and visualization. J. Amer. Statist. Assoc. 112 1684–1696.Chen, Y.-C., Genovese, C. R. and Wasserman, L. (2017). Density level sets: Asymptotics, inference, and visualization. J. Amer. Statist. Assoc. 112 1684–1696.
Chen, Y.-C., Kim, J., Balakrishnan, S., Rinaldo, A. and Wasserman, L. (2016). Statistical inference for cluster trees. ArXiv preprint. Available at arXiv:1605.06416. 1605.06416Chen, Y.-C., Kim, J., Balakrishnan, S., Rinaldo, A. and Wasserman, L. (2016). Statistical inference for cluster trees. ArXiv preprint. Available at arXiv:1605.06416. 1605.06416
Cohen-Steiner, D., Edelsbrunner, H. and Harer, J. (2007). Stability of persistence diagrams. Discrete Comput. Geom. 37 103–120. 1117.54027 10.1007/s00454-006-1276-5Cohen-Steiner, D., Edelsbrunner, H. and Harer, J. (2007). Stability of persistence diagrams. Discrete Comput. Geom. 37 103–120. 1117.54027 10.1007/s00454-006-1276-5
Edelsbrunner, H. and Harer, J. (2008). Persistent homology—A survey. In Surveys on Discrete and Computational Geometry. Contemp. Math. 453 257–282. Amer. Math. Soc., Providence, RI. 1145.55007Edelsbrunner, H. and Harer, J. (2008). Persistent homology—A survey. In Surveys on Discrete and Computational Geometry. Contemp. Math. 453 257–282. Amer. Math. Soc., Providence, RI. 1145.55007
Edelsbrunner, H. and Morozov, D. (2013). Persistent homology: Theory and practice. In European Congress of Mathematics 31–50. Eur. Math. Soc., Zürich. 1364.55008Edelsbrunner, H. and Morozov, D. (2013). Persistent homology: Theory and practice. In European Congress of Mathematics 31–50. Eur. Math. Soc., Zürich. 1364.55008
Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist. 33 1380–1403. 1079.62040 10.1214/009053605000000129 euclid.aos/1120224106Einmahl, U. and Mason, D. M. (2005). Uniform in bandwidth consistency of kernel-type function estimators. Ann. Statist. 33 1380–1403. 1079.62040 10.1214/009053605000000129 euclid.aos/1120224106
Eldridge, J., Belkin, M. and Wang, Y. (2015). Beyond hartigan consistency: Merge distortion metric for hierarchical clustering. In Proceedings of the 28th Conference on Learning Theory 588–606.Eldridge, J., Belkin, M. and Wang, Y. (2015). Beyond hartigan consistency: Merge distortion metric for hierarchical clustering. In Proceedings of the 28th Conference on Learning Theory 588–606.
Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S. and Singh, A. (2014). Confidence sets for persistence diagrams. Ann. Statist. 42 2301–2339. 1310.62059 10.1214/14-AOS1252 euclid.aos/1413810729Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S. and Singh, A. (2014). Confidence sets for persistence diagrams. Ann. Statist. 42 2301–2339. 1310.62059 10.1214/14-AOS1252 euclid.aos/1413810729
Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc. 93 418–491. 0089.38402 10.1090/S0002-9947-1959-0110078-1Federer, H. (1959). Curvature measures. Trans. Amer. Math. Soc. 93 418–491. 0089.38402 10.1090/S0002-9947-1959-0110078-1
Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2009). On the path density of a gradient field. Ann. Statist. 37 3236–3271. 1191.62062 10.1214/08-AOS671 euclid.aos/1250515386Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2009). On the path density of a gradient field. Ann. Statist. 37 3236–3271. 1191.62062 10.1214/08-AOS671 euclid.aos/1250515386
Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2014). Nonparametric ridge estimation. Ann. Statist. 42 1511–1545. 1310.62045 10.1214/14-AOS1218 euclid.aos/1407420007Genovese, C. R., Perone-Pacifico, M., Verdinelli, I. and Wasserman, L. (2014). Nonparametric ridge estimation. Ann. Statist. 42 1511–1545. 1310.62045 10.1214/14-AOS1218 euclid.aos/1407420007
Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. Henri Poincaré B, Probab. Stat. 38 907–921. 1011.62034 10.1016/S0246-0203(02)01128-7Giné, E. and Guillou, A. (2002). Rates of strong uniform consistency for multivariate kernel density estimators. Ann. Inst. Henri Poincaré B, Probab. Stat. 38 907–921. 1011.62034 10.1016/S0246-0203(02)01128-7
Goresky, M. and MacPherson, R. (1980). Intersection homology theory. Topology 19 135–162. 0448.55004 10.1016/0040-9383(80)90003-8Goresky, M. and MacPherson, R. (1980). Intersection homology theory. Topology 19 135–162. 0448.55004 10.1016/0040-9383(80)90003-8
Goresky, M. and MacPherson, R. (1988). Stratified Morse Theory. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 14. Springer, Berlin. 0639.14012Goresky, M. and MacPherson, R. (1988). Stratified Morse Theory. Ergebnisse der Mathematik und Ihrer Grenzgebiete (3) [Results in Mathematics and Related Areas (3)] 14. Springer, Berlin. 0639.14012
Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76 388–394. 0468.62053 10.1080/01621459.1981.10477658Hartigan, J. A. (1981). Consistency of single linkage for high-density clusters. J. Amer. Statist. Assoc. 76 388–394. 0468.62053 10.1080/01621459.1981.10477658
Kpotufe, S. and Luxburg, U. V. (2011). Pruning nearest neighbor cluster trees. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (L. Getoor and T. Scheffer, eds.) 225–232. International Machine Learning Society, Madison, WI.Kpotufe, S. and Luxburg, U. V. (2011). Pruning nearest neighbor cluster trees. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) (L. Getoor and T. Scheffer, eds.) 225–232. International Machine Learning Society, Madison, WI.
Laloe, T. and Servien, R. (2013). Nonparametric estimation of regression level sets. J. Korean Statist. Soc. 1294.62064 10.1016/j.jkss.2012.10.001Laloe, T. and Servien, R. (2013). Nonparametric estimation of regression level sets. J. Korean Statist. Soc. 1294.62064 10.1016/j.jkss.2012.10.001
Lee, J. M. (2013). Introduction to Smooth Manifolds, 2nd ed. Graduate Texts in Mathematics 218. Springer, New York. 1258.53002Lee, J. M. (2013). Introduction to Smooth Manifolds, 2nd ed. Graduate Texts in Mathematics 218. Springer, New York. 1258.53002
Mammen, E. and Polonik, W. (2013). Confidence regions for level sets. J. Multivariate Anal. 122 202–214. 1280.62056 10.1016/j.jmva.2013.07.017Mammen, E. and Polonik, W. (2013). Confidence regions for level sets. J. Multivariate Anal. 122 202–214. 1280.62056 10.1016/j.jmva.2013.07.017
Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates. Ann. Appl. Probab. 19 1108–1142. 1180.62048 10.1214/08-AAP569 euclid.aoap/1245071021Mason, D. M. and Polonik, W. (2009). Asymptotic normality of plug-in level set estimates. Ann. Appl. Probab. 19 1108–1142. 1180.62048 10.1214/08-AAP569 euclid.aoap/1245071021
Mattila, P. (1995). Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability. Cambridge Studies in Advanced Mathematics 44. Cambridge Univ. Press, Cambridge. 0819.28004Mattila, P. (1995). Geometry of Sets and Measures in Euclidean Spaces: Fractals and Rectifiability. Cambridge Studies in Advanced Mathematics 44. Cambridge Univ. Press, Cambridge. 0819.28004
Milnor, J. (1963). Morse Theory. Based on Lecture Notes by M. Spivak and R. Wells. Annals of Mathematics Studies, No. 51. Princeton Univ. Press, Princeton, NJ.Milnor, J. (1963). Morse Theory. Based on Lecture Notes by M. Spivak and R. Wells. Annals of Mathematics Studies, No. 51. Princeton Univ. Press, Princeton, NJ.
Molchanov, I. S. (1990). Empirical estimation of quantiles of distributions of random closed sets. Teor. Veroyatn. Primen. 35 586–592. 0727.62041Molchanov, I. S. (1990). Empirical estimation of quantiles of distributions of random closed sets. Teor. Veroyatn. Primen. 35 586–592. 0727.62041
Morse, M. (1925). Relations between the critical points of a real function of $n$ independent variables. Trans. Amer. Math. Soc. 27 345–396. 51.0451.01Morse, M. (1925). Relations between the critical points of a real function of $n$ independent variables. Trans. Amer. Math. Soc. 27 345–396. 51.0451.01
Morse, M. (1930). The foundations of a theory of the calculus of variations in the large in $m$-space. II. Trans. Amer. Math. Soc. 32 599–631. 56.1079.01Morse, M. (1930). The foundations of a theory of the calculus of variations in the large in $m$-space. II. Trans. Amer. Math. Soc. 32 599–631. 56.1079.01
Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters—An excess mass approach. Ann. Statist. 23 855–881. 0841.62045 10.1214/aos/1176324626 euclid.aos/1176324626Polonik, W. (1995). Measuring mass concentrations and estimating density contour clusters—An excess mass approach. Ann. Statist. 23 855–881. 0841.62045 10.1214/aos/1176324626 euclid.aos/1176324626
Rinaldo, A. and Wasserman, L. (2010). Generalized density clustering. Ann. Statist. 38 2678–2722. 1200.62066 10.1214/10-AOS797 euclid.aos/1278861457Rinaldo, A. and Wasserman, L. (2010). Generalized density clustering. Ann. Statist. 38 2678–2722. 1200.62066 10.1214/10-AOS797 euclid.aos/1278861457
Rinaldo, A., Singh, A., Nugent, R. and Wasserman, L. (2012). Stability of density-based clustering. J. Mach. Learn. Res. 13 905–948. 1283.62130Rinaldo, A., Singh, A., Nugent, R. and Wasserman, L. (2012). Stability of density-based clustering. J. Mach. Learn. Res. 13 905–948. 1283.62130
Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed. Wiley, Hoboken, NJ. 1311.62004Scott, D. W. (2015). Multivariate Density Estimation: Theory, Practice, and Visualization, 2nd ed. Wiley, Hoboken, NJ. 1311.62004
Singh, A., Scott, C. and Nowak, R. (2009). Adaptive Hausdorff estimation of density level sets. Ann. Statist. 37 2760–2782. 1173.62019 10.1214/08-AOS661 euclid.aos/1247836668Singh, A., Scott, C. and Nowak, R. (2009). Adaptive Hausdorff estimation of density level sets. Ann. Statist. 37 2760–2782. 1173.62019 10.1214/08-AOS661 euclid.aos/1247836668
Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification 20 25–47. 1055.62075 10.1007/s00357-003-0004-6Stuetzle, W. (2003). Estimating the cluster type of a density by analyzing the minimal spanning tree of a sample. J. Classification 20 25–47. 1055.62075 10.1007/s00357-003-0004-6
Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist. 25 948–969. 0881.62039 10.1214/aos/1069362732 euclid.aos/1069362732Tsybakov, A. B. (1997). On nonparametric estimation of density level sets. Ann. Statist. 25 948–969. 0881.62039 10.1214/aos/1069362732 euclid.aos/1069362732
Tu, L. W. (2008). An Introduction to Manifolds. Springer, New York. 1144.58001Tu, L. W. (2008). An Introduction to Manifolds. Springer, New York. 1144.58001
Walther, G. (1997). Granulometric smoothing. Ann. Statist. 25 2273–2299. MR1604445 0919.62026 10.1214/aos/1069362379 euclid.aos/1030741072Walther, G. (1997). Granulometric smoothing. Ann. Statist. 25 2273–2299. MR1604445 0919.62026 10.1214/aos/1069362379 euclid.aos/1030741072
Wasserman, L. (2006). All of Nonparametric Statistics. Springer Texts in Statistics. Springer, New York. 1099.62029Wasserman, L. (2006). All of Nonparametric Statistics. Springer Texts in Statistics. Springer, New York. 1099.62029