The Annals of Mathematical Statistics

Two-Sample Tests for Multivariate Distributions

Lionel Weiss

Full-text: Open access

Abstract

$X(1), X(2), \cdots, X(m), Y(1), Y(2), \cdots, Y(n)$ are independent $k$-variate random variables. The distribution of $X(i)$ has pdf $f(x)$, say, where $x$ denotes a $k$-dimensional vector throughout this paper, and the distribution of $Y(j)$ has pdf $g(x)$, say. We assume that $f(x)$ and $g(x)$ are piecewise continuous, and that each has a finite upper bound, which it is not necessary to specify. Denote by $2R_i$ the distance from $X(i)$ to the nearest of the points $X(1), \cdots, X(i - 1), X(i + 1), \cdots, X(m)$, and denote by $S_i$ the number of points $Y(1), \cdots, Y(n)$ contained in the open sphere $\{x: | x - X(i) | < R_i\}$. Clearly, the joint distribution of $S_i, S_j$ is the same as the joint distribution of $S_{i'}, S_{j'}$, for any subscripts with $i \neq j, i' \neq j'$. Let $r$ be a non-negative integer, and $\alpha$ any fixed positive value. $Q(r)$ denotes the Lebesgue integral $\int_{E_k} \frac{2^k \alpha f^2 (x)\lbrack g(x) \rbrack^r}{\lbrack g(x) + 2^k\alpha f(x) \rbrack^{r + 1}} dx,$ where $E_k$ denotes Euclidean $k$-space. We will show that $\lim_{m \rightarrow \infty, m/n = \alpha} P_{m, n}\lbrack S_1 = s_1, S_2 = s_2\rbrack = Q(s_1)Q(s_2),$ for any non-negative integers $s_1,s_2$, the approach being uniform in $s_1,s_2$. Thus, in the limit $S_1, S_2$ are independently distributed, with $\lim_{m \rightarrow \infty, m/n = \alpha} P_{m, n}\lbrack S_1 = s_1\rbrack = Q(s_1).$ In [1], which discussed the univariate case, $S_i$ was defined as the number of $Y$'s closer to $X(i)$ than to any other $X$ to their right. In the present paper, $S_i$ is defined as the number of $Y$'s in another neighborhood of $X(i)$. Our present definition of $S_i$ does not become for $k = 1$ the same as the definition of $S_i$ in [1]. Rather, in the univariate case, our present definition of $S_i$ is the number of $Y$'s lying within a distance $R_i$ on either side of $X(i)$. However, if $\lim_{m \rightarrow \infty, m/n = \alpha} P_{m, n}\lbrack S_1, = s_1, S_2 = s_2\rbrack$ is computed for the univariate case using the definition of $S_i$ given in [1], the only way in which it differs from $Q(s_1)Q(s_2)$ is that $\alpha$ is replaced by $\alpha/2$. Thus it seems reasonable to treat the $S_i$ as defined here as $k$-dimensional analogues of the $S_i$ as defined in [1], at least for large samples. An intuitive reason for $\alpha$ being replaced by $\alpha/2$ is that in our present case, $\sum^m_{i = 1} S_i$ may be less than $n$, whereas in [1] this sum must always equal $n$. Thus in our present case, we are in a sense discarding some of the $Y$'s, which lowers $n$ relative to $m$ and thus raises $\alpha$ by a certain factor (2, as it happens). In our present case, $\sum S_i$ may be less than $n$ because the $R_i$ are chosen to make the spheres around the $X$'s non-overlapping, thus simplifying the analysis. The $R_i$ were chosen to give the largest possible non-overlapping spheres because it would seem intuitively that the larger the spheres, the more rapid the approach of the probabilities to their limiting values.

Article information

Source
Ann. Math. Statist., Volume 31, Number 1 (1960), 159-164.

Dates
First available in Project Euclid: 27 April 2007

Permanent link to this document
https://projecteuclid.org/euclid.aoms/1177705995

Digital Object Identifier
doi:10.1214/aoms/1177705995

Mathematical Reviews number (MathSciNet)
MR119305

Zentralblatt MATH identifier
0092.36401

JSTOR
links.jstor.org

Citation

Weiss, Lionel. Two-Sample Tests for Multivariate Distributions. Ann. Math. Statist. 31 (1960), no. 1, 159--164. doi:10.1214/aoms/1177705995. https://projecteuclid.org/euclid.aoms/1177705995


Export citation