Abstract
For independent $d$-variate random samples $X_1, \cdots, X_{n_1}$ i.i.d. $f(x), Y_1, \cdots, Y_{n_2}$ i.i.d. $g(x)$, where the densities $f$ and $g$ are assumed to be continuous a.e., consider the number $T$ of all $k$ nearest neighbor comparisons in which observations and their neighbors belong to the same sample. We show that, if $f = g$ a.e., the limiting (normal) distribution of $T$, as $\min(n_1, n_2) \rightarrow \infty, n_1/(n_1 + n_2) \rightarrow \tau, 0 < \tau < 1$, does not depend on $f$. An omnibus procedure for testing the hypothesis $H_0: f = g$ a.e. is obtained by rejecting $H_0$ for large values of $T$. The result applies to a general distance (generated by a norm on $\mathbb{R}^d$) for determining nearest neighbors, and it generalizes to the multisample situation.
Citation
Norbert Henze. "A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences." Ann. Statist. 16 (2) 772 - 783, June, 1988. https://doi.org/10.1214/aos/1176350835
Information