## Electronic Journal of Statistics

### A statistical test of isomorphism between metric-measure spaces using the distance-to-a-measure signature

Claire Brécheteau

#### Abstract

We introduce the notion of DTM-signature, a measure on $\mathbb{R}$ that can be associated to any metric-measure space. This signature is based on the function distance to a measure (DTM) introduced in 2009 by Chazal, Cohen-Steiner and Mérigot. It leads to a pseudo-metric between metric-measure spaces, that is bounded above by the Gromov-Wasserstein distance. This pseudo-metric is used to build a statistical test of isomorphism between two metric-measure spaces, from the observation of two $N$-samples.

The test is based on subsampling methods and comes with theoretical guarantees. It is proven to be of the correct level asymptotically. Also, when the measures are supported on compact subsets of $\mathbb{R}^{d}$, rates of convergence are derived for the $L_{1}$-Wasserstein distance between the distribution of the test statistic and its subsampling approximation. These rates depend on some parameter $\rho >1$. In addition, we prove that the power is bounded above by $\exp (-CN^{1/\rho })$, with $C$ proportional to the square of the aforementioned pseudo-metric between the metric-measure spaces. Under some geometrical assumptions, we also derive lower bounds for this pseudo-metric.

An algorithm is proposed for the implementation of this statistical test, and its performance is compared to the performance of other methods through numerical experiments.

#### Article information

Source
Electron. J. Statist., Volume 13, Number 1 (2019), 795-849.

Dates
First available in Project Euclid: 26 March 2019

https://projecteuclid.org/euclid.ejs/1553565705

Digital Object Identifier
doi:10.1214/19-EJS1539

Subjects
Primary: 62G10: Hypothesis testing
Secondary: 62G09: Resampling methods

#### Citation

Brécheteau, Claire. A statistical test of isomorphism between metric-measure spaces using the distance-to-a-measure signature. Electron. J. Statist. 13 (2019), no. 1, 795--849. doi:10.1214/19-EJS1539. https://projecteuclid.org/euclid.ejs/1553565705

#### References

• [1] de Acosta, A. and Giné, E. (1979). Convergence Of Moments And Related Functionals In The Central Limit Theorem In Banach Spaces., Zeitschrift für Wahrscheinlichkeitstheorie und verwandte Gebiete 48 213–231.
• [2] Araujo, A. and Giné, E. (1980)., The Central Limit Theorem for Real and Banach Valued Random Variables John Wiley & Sons Inc.
• [3] Arlot, S. (2007). Rééchantillonnage et Sélection de Modèles PhD thesis. Université Paris-Sud – Paris, XI.
• [4] del Barrio, E., Giné, E. and Matrán, C. (1999). Central Limit Theorems For The Wasserstein Distance Between The Empirical And The True Distributions., The Annals of Probability 27 1009–1071.
• [5] del Barrio, E., Lescornel, H. and Loubes, J-M. (2015). A statistical analysis of a deformation model with Wasserstein barycenters : estimation procedure and goodness of fit test., unpublished.
• [6] Bickel, P. and Doksum, K. (1977)., Mathematical statistics : basic ideas and selected topics. Englewood Cliffs, N.J. Prentice Hall.
• [7] Billingsley, P. (1999)., Convergence of Probability Measures Wiley-Interscience.
• [8] Bobkov, S. and Ledoux, M. (2014). One-Dimensional Empirical Measures, Order Statistics And Kantorovich Transport Distances., the Memoirs of the AMS - American Mathematical Society. to be published
• [9] Buchet, M. (2014). Topological Inference From Measures. PhD thesis. Université Paris-Sud – Paris, XI.
• [10] Buet, B. and Leonardi, G. (2015). Recovering Measures From Approximate Values On Balls., unpublished.
• [11] Cazals, F. and Lhéritier, A. (2015). Beyond Two-sample-tests: Localizing Data Discrepancies in High-dimensional Spaces., IEEE/ACM DSAA
• [12] Chazal, F., Cohen-Steiner, D., Guibas L. J., Mémoli, F. and Oudot S. (2009). Gromov-Hausdorff Stable Signatures for Shapes using Persistence., Computer Graphics Forum (proc. SGP 2009) 1393–1403.
• [13] Chazal, F., Cohen-Steiner, D. and Mérigot, Q. (2011). Geometric Inference for Probability Measures., Foundations of Computational Mathematics 11 733–751.
• [14] Chazal, F., Fasy, B., Lecci, F., Michel, B., Rinaldo, A. and Wasserman, L. (2018). Robust Topological Inference: Distance To a Measure and Kernel Distance., Journal of Machine Learning Research 18 1–40.
• [15] Chazal, F., Fasy, B. T., Lecci, F., Michel, B., Rinaldo, A. and Wasserman, L. (2015). Subsampling methods for persistent homology., Proceedings of the 32nd International Conference on Machine Learning, PMLR 37 2143–2151.
• [16] Chazal, F., Fasy, B., Lecci, F., Rinaldo, A., Singh, A. and Wasserman, L. (2013). On the Bootstrap for Persistence Diagrams and Landscapes, Modeling and Analysis of Information Systems 20 96–105.
• [17] Chazal, F., Glisse, M., Labruère, C. and Michel, B. (2015). Convergence rates for persistence diagram estimation in topological data analysis, Journal of Machine Learning Research 16 3603–3635
• [18] Chazal, F., Massart, P. and Michel, B. (2016). Rates Of Convergence For Robust Geometric Inference., Electronic Journal of Statistics 10 2243–2286.
• [19] Chazal, F., De Silva, V. and Oudot, S. (2014). Persistence stability for geometric complexes., Geometriae Dedicata 173 193–214.
• [20] Cuevas, A. (2009). Set estimation: another bridge between statistics and geometry., Boletín de Estadística e Investigación Operativa 25 71–85
• [21] Cuevas, A. and Rodríguez-Casal, A. (2004). On boundary estimation., Advances in Applied Probability 340–354
• [22] Efron, B. (1979). Bootstrap Methods: Another Look at the Jackknife, Annals of Statistics 7 1–26.
• [23] Fasy, B., Kim, J., Lecci, F. and Maria, C. (2014). Introduction to the R package TDA., unpublished.
• [24] Fasy, B., Lecci, F., Rinaldo, A., Wasserman, L., Balakrishnan, S. and Singh, A. (2014). Confidence Sets For Persistence Diagrams, The Annals of Statistics. 42 2301—2339.
• [25] Federer, H. (1959). Curvature Measures., Transactions of the American Mathematical Society 93 418–491.
• [26] Fournier, N. and Guillin, A. (2015). On The Rate Of Convergence In Wasserstein Distance Of The Empirical Measure., Probability Theory & Related Fields 162 707–738.
• [27] Fromont, M. and Laurent, B. (2006). Adaptive goodness-of-fit tests in a density model, Annals of Statistics 34 680–720.
• [28] Fromont, M., Laurent, B., Lerasle, M. and Reynaud-Bouret, P. (2012). Kernels Based Tests with Non-asymptotic Bootstrap Approaches for Two-sample Problems, Journal of Machine Learning Research: Workshop and Conference proceedings COLT 2012 23 23.1–23.22.
• [29] Gretton, A., Borgwardt, K., Rasch, M., Schölkopf, B. and Smola, A. (2012). A Kernel Two-Sample Test., Journal of Machine Learning Research 13 723–773.
• [30] Gromov, M. (2003)., Metric Structures for Riemannian and Non-Riemannian Spaces. Birkhäuser Basel.
• [31] Johnson, B. McK. and Killeen, T. (1983). An Explicit Formula for the C.D.F. of the $L_1$ Norm of the Brownian Bridge, The Annals of Probability 11 807–808.
• [32] Rice, S.O. (1982). The Integral of the Absolute Value of the Pinned Wiener Process– Calculation of Its Probability Density by Numerical Integration, The Annals of Probability 10 240–243.
• [33] Lieutier, A. (2004). Any Open Bounded Subset of R$^n$ Has the Same Homotopy Type Than Its Medial Axis., Computer Aided Geometric Design 36 1029–1046.
• [34] Massart, P. (1990). The Tight Constant in the Dvoretzky-Kiefer-Wolfowitz Inequality., The Annals of Probability 18 1269–1283.
• [35] Mémoli, F. (2011). Gromov–Wasserstein Distances and the Metric Approach to Object Matching., Foundations of Computational Mathematics 11 417–487.
• [36] von Luxburg, U. and Alamgir, M. (2013). Density estimation from unweighted k-nearest neighbor graphs: a roadmap., Neural Information Processing Systems (NIPS)
• [37] Niyogi, P., Smale, S. and Weinberger, S. (2008). Finding the Homology of Submanifolds with High Confidence from Random Samples., Discrete and Computational Geometry 39 419–441.
• [38] Osada, R., Funkhouser, T., Chazelle, B. and Dobkin, D. (2002). Shape Distributions., ACM Transactions on Graphics 21 807–832.
• [39] Politis, D. and Romano, J. (1994). Large sample confidence regions based on subsamples under minimal assumptions., The Annals of Statistics 22 2031–2050.
• [40] Ramdas, A., Trillos, N. and Cuturi, M. (2015). On Wasserstein Two Sample Testing and Related Families of Nonparametric Tests., unpublished.
• [41] van der Vaart, A. and Wellner, J. (1996)., Weak Convergence and Empirical Processes Springer Series in Statistics
• [42] Villani, C. (2003)., Topics in Optimal Transportation. American Mathematical Society.