We present a study of a kernel-based two-sample test statistic related to the Maximum Mean Discrepancy (MMD) in the manifold data setting, assuming that high-dimensional observations are close to a low-dimensional manifold. We characterize the test level and power in relation to the kernel bandwidth, the number of samples, and the intrinsic dimensionality of the manifold. Specifically, when data densities p and q are supported on a d-dimensional sub-manifold embedded in an m-dimensional space and are Hölder with order β (up to 2) on , we prove a guarantee of the test power for finite sample size n that exceeds a threshold depending on d, β, and the squared -divergence between p and q on the manifold, and with a properly chosen kernel bandwidth γ. For small density departures, we show that with large n they can be detected by the kernel test when is greater than up to a certain constant and γ scales as . The analysis extends to cases where the manifold has a boundary and the data samples contain high-dimensional additive noise. Our results indicate that the kernel two-sample test has no curse-of-dimensionality when the data lie on or near a low-dimensional manifold. We validate our theory and the properties of the kernel test for manifold data through a series of numerical experiments.
Funding Statement
The work was supported by NSF DMS-2134037. X.C. was also partially supported by NSF DMS-2237842 and DMS-2007040. Y.X. was also partially supported by an NSF CAREER CCF-1650913, NSF DMS-2134037, CMMI-2015787, CMMI-2112533, DMS-1938106, and DMS-1830210.
The authors would like to thank the anonymous referees and the Associate Editor for their constructive comments that improved the quality of this paper.
Xiuyuan Cheng. Yao Xie. "Kernel two-sample tests for manifold data." Bernoulli 30 (4) 2572 - 2597, November 2024. https://doi.org/10.3150/23-BEJ1685