May 2021 Interpoint distance based two sample tests in high dimension
Changbo Zhu, Xiaofeng Shao
Author Affiliations +
Bernoulli 27(2): 1189-1211 (May 2021). DOI: 10.3150/20-BEJ1270

Abstract

In this paper, we study a class of two sample test statistics based on inter-point distances in the high dimensional and low/medium sample size setting. Our test statistics include the well-known energy distance and maximum mean discrepancy with Gaussian and Laplacian kernels, and the critical values are obtained via permutations. We show that all these tests are inconsistent when the two high dimensional distributions correspond to the same marginal distributions but differ in other aspects of the distributions. The tests based on energy distance and maximum mean discrepancy mainly target the differences between marginal means and variances, whereas the test based on L1-distance can capture the difference in marginal distributions. Our theory sheds new light on the limitation of inter-point distance based tests, the impact of different distance metrics, and the behavior of permutation tests in high dimension. Some simulation results and a real data illustration are also presented to corroborate our theoretical findings.

Citation

Download Citation

Changbo Zhu. Xiaofeng Shao. "Interpoint distance based two sample tests in high dimension." Bernoulli 27 (2) 1189 - 1211, May 2021. https://doi.org/10.3150/20-BEJ1270

Information

Received: 1 September 2019; Revised: 1 June 2020; Published: May 2021
First available in Project Euclid: 24 March 2021

Digital Object Identifier: 10.3150/20-BEJ1270

Keywords: high dimensionality , Permutation test , power analysis , two sample test

Rights: Copyright © 2021 ISI/BS

JOURNAL ARTICLE
23 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.27 • No. 2 • May 2021
Back to Top