Open Access
2022 Two-sample test for equal distributions in separate metric space: New maximum mean discrepancy based approaches
Jin-Ting Zhang, Łukasz Smaga
Author Affiliations +
Electron. J. Statist. 16(2): 4090-4132 (2022). DOI: 10.1214/22-EJS2033

Abstract

This article develops statistical methods for testing the equality of two distributions based on two independent samples generated in some separable metric space. Such methods are broadly applicable in identifying similarity or distinction of two complicated data sets (e.g., high-dimensional data or functional data) collected in a wide range of research or industry areas, including biology, bioinformatics, medicine, material science, among others. Recently a so-called maximum mean discrepancy (MMD) based approach for the above two-sample problem has been proposed, resulting in several interesting tests. However, the main theoretical and numerical results of these MMD based tests depend on the very restricted assumption that the two samples have equal sample sizes. In addition, these tests are generally implemented via permutation when the equal sample size assumption is violated. In real data analysis, this equal sample size assumption is hardly satisfied, and dropping away some of the observations often means the loss of priceless information. It is also of interest to know if an MMD-based test can be conducted generally without using permutation. In this paper, we further study this MMD based approach with the equal sample size assumption removed. We establish the asymptotic null and alternative distributions of the MMD test statistic and its root-n consistency. We propose methods for approximating the null distribution, resulting in easy and quick implementation. Numerical experiments based on artificial data and two real data sets from two different areas of applications demonstrate that in terms of control of the type I error level and power, the resulting tests perform better or no worse than several existing competitors.

Funding Statement

Zhang’s work was supported by the National University of Singapore Academic Research grant R-155-000-212-114.

Acknowledgments

A part of calculations for simulation studies was made at the Poznań Supercomputing and Networking Center.

Citation

Download Citation

Jin-Ting Zhang. Łukasz Smaga. "Two-sample test for equal distributions in separate metric space: New maximum mean discrepancy based approaches." Electron. J. Statist. 16 (2) 4090 - 4132, 2022. https://doi.org/10.1214/22-EJS2033

Information

Received: 1 February 2021; Published: 2022
First available in Project Euclid: 5 August 2022

MathSciNet: MR4462281
zbMATH: 07577512
Digital Object Identifier: 10.1214/22-EJS2033

Subjects:
Primary: 62H15
Secondary: 62G10

Keywords: Equality of distribution , Hypothesis testing , maximum mean discrepancy , three-cumulant matched chi-square approximation , two-sample problem

Vol.16 • No. 2 • 2022
Back to Top