Open Access
2023 Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology
Javier González-Delgado, Alberto González-Sanz, Juan Cortés, Pierre Neuvial
Author Affiliations +
Electron. J. Statist. 17(1): 1547-1586 (2023). DOI: 10.1214/23-EJS2135

Abstract

This work is motivated by the study of local protein structure, which is defined by two variable dihedral angles that take values from probability distributions on the flat torus. Our goal is to provide the space P(R2Z2) with a metric that quantifies local structural modifications due to changes in the protein sequence, and to define associated two-sample goodness-of-fit testing approaches. Due to its adaptability to the geometry of the underlying space, we focus on the Wasserstein distance as a metric between distributions.

We extend existing results of the theory of Optimal Transport to the d-dimensional flat torus Td=RdZd, in particular a Central Limit Theorem for the fluctuations of the empirical optimal transport cost. Moreover, we propose different approaches for two-sample goodness-of-fit testing for the one and two-dimensional case, based on the Wasserstein distance. We prove their validity and consistency. We provide an implementation of these tests in R. Their performance is assessed by numerical experiments on synthetic data and illustrated by an application to protein structure data.

Funding Statement

This work was supported by the AI Interdisciplinary Institute ANITI, which is funded by the French “Investing for the Future – PIA3” program under the Grant agreement ANR-19-PI3A-0004, and by the ANR LabEx CIMI (grant ANR-11-LABX-0040) within the French State Programme “Investissements d’Avenir”.

Acknowledgments

The authors are grateful to the anonymous referees whose comments and suggestions have greatly improved the manuscript.

Citation

Download Citation

Javier González-Delgado. Alberto González-Sanz. Juan Cortés. Pierre Neuvial. "Two-sample goodness-of-fit tests on the flat torus based on Wasserstein distance and their relevance to structural biology." Electron. J. Statist. 17 (1) 1547 - 1586, 2023. https://doi.org/10.1214/23-EJS2135

Information

Received: 1 April 2022; Published: 2023
First available in Project Euclid: 8 June 2023

MathSciNet: MR4598874
zbMATH: 07725163
Digital Object Identifier: 10.1214/23-EJS2135

Keywords: central limit theorem , flat torus , Goodness-of-fit test , intrinsically disordered proteins , Optimal transport , structural biology , Wasserstein distance

Vol.17 • No. 1 • 2023
Back to Top