The Annals of Statistics

A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences

Norbert Henze

Full-text: Open access

Abstract

For independent $d$-variate random samples $X_1, \cdots, X_{n_1}$ i.i.d. $f(x), Y_1, \cdots, Y_{n_2}$ i.i.d. $g(x)$, where the densities $f$ and $g$ are assumed to be continuous a.e., consider the number $T$ of all $k$ nearest neighbor comparisons in which observations and their neighbors belong to the same sample. We show that, if $f = g$ a.e., the limiting (normal) distribution of $T$, as $\min(n_1, n_2) \rightarrow \infty, n_1/(n_1 + n_2) \rightarrow \tau, 0 < \tau < 1$, does not depend on $f$. An omnibus procedure for testing the hypothesis $H_0: f = g$ a.e. is obtained by rejecting $H_0$ for large values of $T$. The result applies to a general distance (generated by a norm on $\mathbb{R}^d$) for determining nearest neighbors, and it generalizes to the multisample situation.

Article information

Source
Ann. Statist. Volume 16, Number 2 (1988), 772-783.

Dates
First available in Project Euclid: 12 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aos/1176350835

Digital Object Identifier
doi:10.1214/aos/1176350835

Mathematical Reviews number (MathSciNet)
MR947577

Zentralblatt MATH identifier
0645.62062

JSTOR
links.jstor.org

Subjects
Primary: 62H15: Hypothesis testing
Secondary: 62G10: Hypothesis testing

Keywords
Multivariate two-sample test nearest neighbor-type coincidences

Citation

Henze, Norbert. A Multivariate Two-Sample Test Based on the Number of Nearest Neighbor Type Coincidences. Ann. Statist. 16 (1988), no. 2, 772--783. doi:10.1214/aos/1176350835. https://projecteuclid.org/euclid.aos/1176350835


Export citation