June 2024 Spectral regularized kernel two-sample tests
Omar Hagrass, Bharath Sriperumbudur, Bing Li
Author Affiliations +
Ann. Statist. 52(3): 1076-1101 (June 2024). DOI: 10.1214/24-AOS2383

Abstract

Over the last decade, an approach that has gained a lot of popularity to tackle nonparametric testing problems on general (i.e., non-Euclidean) domains is based on the notion of reproducing kernel Hilbert space (RKHS) embedding of probability distributions. The main goal of our work is to understand the optimality of two-sample tests constructed based on this approach. First, we show the popular MMD (maximum mean discrepancy) two-sample test to be not optimal in terms of the separation boundary measured in Hellinger distance. Second, we propose a modification to the MMD test based on spectral regularization by taking into account the covariance information (which is not captured by the MMD test) and prove the proposed test to be minimax optimal with a smaller separation boundary than that achieved by the MMD test. Third, we propose an adaptive version of the above test which involves a data-driven strategy to choose the regularization parameter and show the adaptive test to be almost minimax optimal up to a logarithmic factor. Moreover, our results hold for the permutation variant of the test where the test threshold is chosen elegantly through the permutation of the samples. Through numerical experiments on synthetic and real data, we demonstrate the superior performance of the proposed test in comparison to the MMD test and other popular tests in the literature.

Funding Statement

OH and BKS are partially supported by the National Science Foundation (NSF) CAREER award DMS-1945396.
BL is supported by NSF grant DMS-2210775.

Acknowledgments

The authors thank the Associate Editor and reviewers for their valuable comments and constructive feedback, which helped to significantly improve the paper.

Citation

Download Citation

Omar Hagrass. Bharath Sriperumbudur. Bing Li. "Spectral regularized kernel two-sample tests." Ann. Statist. 52 (3) 1076 - 1101, June 2024. https://doi.org/10.1214/24-AOS2383

Information

Received: 1 December 2022; Revised: 1 March 2024; Published: June 2024
First available in Project Euclid: 11 August 2024

Digital Object Identifier: 10.1214/24-AOS2383

Subjects:
Primary: 62G10
Secondary: 46E22 , 47A52 , 65J20 , 65J22

Keywords: Adaptivity , Bernstein’s inequality , Covariance operator , maximum mean discrepancy , Permutation test , ‎reproducing kernel Hilbert ‎space , Spectral regularization , two-sample test , U-statistics

Rights: Copyright © 2024 Institute of Mathematical Statistics

Vol.52 • No. 3 • June 2024
Back to Top