We consider the problem of two-sample testing for data generated under the manifold setting, namely where potentially high-dimensional observations are made for underlying objects concentrated near a low-dimensional manifold. Existing two-sample tests typically suffer from a loss of power under high-dimensionality; under the manifold setting, these tests largely ignore the underlying geometric structure of the data, resulting in misleading representations of similarity. Instead, we avoid these issues and propose a non-parametric two-sample test for general data objects which takes into account the intrinsic geometry of the data. A data-driven metric is utilized to characterize the distance between points while respecting the manifold structure. The test statistic behaves like a distance metric between distributions and is shown to be consistent against all alternatives where the two distributions have a positive energy distance. Empirical studies and data analysis of speech recordings demonstrate the test’s superior performance for manifold data.
Lynna Chu. Xiongtao Dai. "Manifold energy two-sample test." Electron. J. Statist. 18 (1) 145 - 166, 2024. https://doi.org/10.1214/23-EJS2203