Electronic Journal of Statistics

Rates of convergence for robust geometric inference

Frédéric Chazal, Pascal Massart, and Bertrand Michel

Full-text: Open access

Abstract

Distances to compact sets are widely used in the field of Topological Data Analysis for inferring geometric and topological features from point clouds. In this context, the distance to a probability measure (DTM) has been introduced by Chazal et al., 2011b as a robust alternative to the distance a compact set. In practice, the DTM can be estimated by its empirical counterpart, that is the distance to the empirical measure (DTEM). In this paper we give a tight control of the deviation of the DTEM. Our analysis relies on a local analysis of empirical processes. In particular, we show that the rate of convergence of the DTEM directly depends on the regularity at zero of a particular quantile function which contains some local information about the geometry of the support. This quantile function is the relevant quantity to describe precisely how difficult is a geometric inference problem. Several numerical experiments illustrate the convergence of the DTEM and also confirm that our bounds are tight.

Article information

Source
Electron. J. Statist., Volume 10, Number 2 (2016), 2243-2286.

Dates
Received: March 2016
First available in Project Euclid: 25 August 2016

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1472125729

Digital Object Identifier
doi:10.1214/16-EJS1161

Mathematical Reviews number (MathSciNet)
MR3541971

Zentralblatt MATH identifier
1347.62055

Subjects
Primary: 62G05: Estimation
Secondary: 62G30: Order statistics; empirical distribution functions 68U05: Computer graphics; computational geometry [See also 65D18] 62-07: Data analysis 28A33: Spaces of measures, convergence of measures [See also 46E27, 60Bxx]

Keywords
Geometric inference distance to measure rates of convergence

Citation

Chazal, Frédéric; Massart, Pascal; Michel, Bertrand. Rates of convergence for robust geometric inference. Electron. J. Statist. 10 (2016), no. 2, 2243--2286. doi:10.1214/16-EJS1161. https://projecteuclid.org/euclid.ejs/1472125729


Export citation

References

  • Arias-Castro, E., Donoho, D., and Huo, X. (2006). Adaptive multiscale detection of filamentary structures in a background of uniform random points., The Annals of Statistics, 34:326–349.
  • Biau, G., Chazal, F., Cohen-Steiner, D., Devroye, L., and Rodriguez, C. (2011). A weighted k-nearest neighbor density estimate for geometric inference., Electronic Journal of Statistics, 5:204–237.
  • Bobkov, S. and Ledoux, M. (2014). One-dimensional empirical measures, order statistics and Kantorovich transport distances., Preprint.
  • Buchet, M., Chazal, F., Dey, T. K., Fan, F., Oudot, S. Y., and Wang, Y. (2015a). Topological analysis of scalar fields with outliers. In, Proc. Sympos. on Computational Geometry.
  • Buchet, M., Chazal, F., Oudot, S., and Sheehy, D. R. (2015b). Efficient and robust persistent homology for measures. In, Proceedings of the 26th ACM-SIAM symposium on Discrete algorithms. SIAM. SIAM.
  • Caillerie, C., Chazal, F., Dedecker, J., and Michel, B. (2011). Deconvolution for the Wasserstein metric and geometric inference., Electron. J. Stat., 5:1394–1423.
  • Cambanis, S., Simons, G., and Stout, W. (1976). Inequalities for $\mathbbEk(x,y)$ when the marginals are fixed., Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete, 36(4):285–294.
  • Carlsson, G. (2009). Topology and data., Bulletin of the American Mathematical Society, 46(2):255–308.
  • Chazal, F., Chen, D., Guibas, L., Jiang, X., and Sommer, C. (2011a). Data-driven trajectory smoothing. In, Proc. ACM SIGSPATIAL GIS.
  • Chazal, F., Cohen-Steiner, D., and Lieutier, A. (2009a). Normal cone approximation and offset shape isotopy., Computational Geometry, 42(6):566–581.
  • Chazal, F., Cohen-Steiner, D., Lieutier, A., and Thibert, B. (2009b). Stability of Curvature Measures., Computer Graphics Forum (proc. SGP 2009), pages 1485–1496.
  • Chazal, F., Cohen-Steiner, D., and Mérigot, Q. (2011b). Geometric inference for probability measures., Foundations of Computational Mathematics, 11(6):733–751.
  • Chazal, F., Fasy, B. T., Lecci, F., Michel, B., Rinaldo, A., and Wasserman, L. (2014a). Robust topological inference: Distance to a measure and kernel distance., arXiv preprint arXiv:1412.7197.
  • Chazal, F., Fasy, B. T., Lecci, F., Michel, B., Rinaldo, A., and Wasserman, L. (2014b). Subsampling methods for persistent homology., arXiv preprint 1406.1901, accepted for ICML15.
  • Chazal, F., Glisse, M., Labruère, C., and Michel, B. (2015). Convergence rates for persistence diagram estimation in topological data analysis., Journal of Machine Learning Research, 16:3603–3635.
  • Chazal, F., Guibas, L. J., Oudot, S. Y., and Skraba, P. (2013). Persistence-based clustering in riemannian manifolds., Journal of the ACM (JACM), 60(6):41.
  • Chazal, F. and Lieutier, A. (2008). Smooth manifold reconstruction from noisy and non-uniform approximation with guarantees., Computational Geometry, 40(2):156–170.
  • Cuevas, A. (2009). Set estimation: another bridge between statistics and geometry., Bol. Estad. Investig. Oper., 25(2):71–85.
  • Cuevas, A. and Rodríguez-Casal, A. (2004). On boundary estimation., Advances in Applied Probability, pages 340–354.
  • del Barrio, E., Giné, E., and Matrán, C. (1999). The central limit theorem for the Wasserstein distance between the empirical and the true distributions., Ann. Probab., 27:1009–1971.
  • del Barrio, E., Giné, E., and Utzet, F. (2005). Asymptotics for $\mathbbL_2$ functionals of the empirical quantile process, with applications to tests of fit based on weighted Wasserstein distances., Bernoulli, 11:131–189.
  • Dereich, S., Scheutzow, M., and Schottstedt, R. (2013). Constructive quantization: Approximation by empirical measures., Ann. Inst. H. Poincaré Probab. Statist., 49:1183–1203.
  • Devroye, L. and Wise, G. L. (1980). Detection of abnormal behavior via nonparametric estimation of the support., SIAM Journal on Applied Mathematics, 38(3):480–488.
  • Dvoretzky, A., Kiefer, J., and Wolfowitz, J. (1956). Asymptotic minimax character of the sample distribution function and of the classical multinomial estimator., The Annals of Mathematical Statistics, pages 642–669.
  • Fasy, B. T., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S., Singh, A., et al. (2014). Confidence sets for persistence diagrams., The Annals of Statistics, 42(6):2301–2339.
  • Fournier, N. and Guillin, A. (2013). On the rate of convergence in wasserstein distance of the empirical measure., Probability Theory and Related Fields, pages 1–32.
  • Genovese, C., Perone-Pacifico, M., Verdinelli, I., and Wasserman, L. (2009). On the path density of a gradient field., The Annals of Statistics, 37:3236–3271.
  • Genovese, C. R., Perone-Pacifico, M., Verdinelli, I., and Wasserman, L. (2012). Manifold estimation and singular deconvolution under hausdorff loss., The Annals of Statistics, 40(2):941–963.
  • Guibas, L., Morozov, D., and Mérigot, Q. (2013). Witnessed k-distance., Discrete Comput. Geom., 49:22–45.
  • Hastie, T. and Stuetzle, W. (1989). Principal curves., J. Amer. Statist. Assoc., 84(406):502–516.
  • Mammen, E., Tsybakov, A. B., et al. (1999). Smooth discrimination analysis., The Annals of Statistics, 27(6):1808–1829.
  • Massart, P. (1990). The tight constant in the dvoretzky-kiefer-wolfowitz inequality., The Annals of Probability, 18(3):pp. 1269–1283.
  • Massart, P. (2007)., Concentration inequalities and model selection. Springer, Berlin. Lectures from the 33rd Summer School on Probability Theory held in Saint-Flour, July 6–23, 2003.
  • Niyogi, P., Smale, S., and Weinberger, S. (2008). Finding the homology of submanifolds with high confidence from random samples., Discrete & Computational Geometry, 39(1-3):419–441.
  • Phillips, J. M., Wang, B., and Zheng, Y. (2014). Geometric inference on kernel density estimates., arXiv preprint 1307.7760.
  • R Core Team (2014)., R: A Language and Environment for Statistical Computing. R Foundation for Statistical Computing, Vienna, Austria.
  • Rachev, S. and Rüschendorf, L. (1998)., Mass transportation problems, volume II of Probability and its Applications. Springer-Verlag.
  • Shorack, G. R. and Wellner, J. A. (2009)., Empirical processes with applications to statistics, volume 59. SIAM.
  • Singh, A., Scott, C., and Nowak, R. (2009). Adaptive hausdorff estimation of density level sets., The Annals of Statistics, 37(5B):2760–2782.
  • Villani, C. (2008)., Optimal Transport: Old and New. Grundlehren Der Mathematischen Wissenschaften. Springer-Verlag.
  • Yu, B. (1997). Assouad, Fano, and Le Cam. In, Festschrift for Lucien Le Cam, pages 423–435. Springer, New York.