Abstract
This article develops statistical methods for testing the equality of two distributions based on two independent samples generated in some separable metric space. Such methods are broadly applicable in identifying similarity or distinction of two complicated data sets (e.g., high-dimensional data or functional data) collected in a wide range of research or industry areas, including biology, bioinformatics, medicine, material science, among others. Recently a so-called maximum mean discrepancy (MMD) based approach for the above two-sample problem has been proposed, resulting in several interesting tests. However, the main theoretical and numerical results of these MMD based tests depend on the very restricted assumption that the two samples have equal sample sizes. In addition, these tests are generally implemented via permutation when the equal sample size assumption is violated. In real data analysis, this equal sample size assumption is hardly satisfied, and dropping away some of the observations often means the loss of priceless information. It is also of interest to know if an MMD-based test can be conducted generally without using permutation. In this paper, we further study this MMD based approach with the equal sample size assumption removed. We establish the asymptotic null and alternative distributions of the MMD test statistic and its root-n consistency. We propose methods for approximating the null distribution, resulting in easy and quick implementation. Numerical experiments based on artificial data and two real data sets from two different areas of applications demonstrate that in terms of control of the type I error level and power, the resulting tests perform better or no worse than several existing competitors.
Funding Statement
Zhang’s work was supported by the National University of Singapore Academic Research grant R-155-000-212-114.
Acknowledgments
A part of calculations for simulation studies was made at the Poznań Supercomputing and Networking Center.
Citation
Jin-Ting Zhang. Łukasz Smaga. "Two-sample test for equal distributions in separate metric space: New maximum mean discrepancy based approaches." Electron. J. Statist. 16 (2) 4090 - 4132, 2022. https://doi.org/10.1214/22-EJS2033
Information