The Annals of Mathematical Statistics
- Ann. Math. Statist.
- Volume 31, Number 1 (1960), 159-164.
Two-Sample Tests for Multivariate Distributions
Abstract
$X(1), X(2), \cdots, X(m), Y(1), Y(2), \cdots, Y(n)$ are independent $k$-variate random variables. The distribution of $X(i)$ has pdf $f(x)$, say, where $x$ denotes a $k$-dimensional vector throughout this paper, and the distribution of $Y(j)$ has pdf $g(x)$, say. We assume that $f(x)$ and $g(x)$ are piecewise continuous, and that each has a finite upper bound, which it is not necessary to specify. Denote by $2R_i$ the distance from $X(i)$ to the nearest of the points $X(1), \cdots, X(i - 1), X(i + 1), \cdots, X(m)$, and denote by $S_i$ the number of points $Y(1), \cdots, Y(n)$ contained in the open sphere $\{x: | x - X(i) | < R_i\}$. Clearly, the joint distribution of $S_i, S_j$ is the same as the joint distribution of $S_{i'}, S_{j'}$, for any subscripts with $i \neq j, i' \neq j'$. Let $r$ be a non-negative integer, and $\alpha$ any fixed positive value. $Q(r)$ denotes the Lebesgue integral $\int_{E_k} \frac{2^k \alpha f^2 (x)\lbrack g(x) \rbrack^r}{\lbrack g(x) + 2^k\alpha f(x) \rbrack^{r + 1}} dx,$ where $E_k$ denotes Euclidean $k$-space. We will show that $\lim_{m \rightarrow \infty, m/n = \alpha} P_{m, n}\lbrack S_1 = s_1, S_2 = s_2\rbrack = Q(s_1)Q(s_2),$ for any non-negative integers $s_1,s_2$, the approach being uniform in $s_1,s_2$. Thus, in the limit $S_1, S_2$ are independently distributed, with $\lim_{m \rightarrow \infty, m/n = \alpha} P_{m, n}\lbrack S_1 = s_1\rbrack = Q(s_1).$ In [1], which discussed the univariate case, $S_i$ was defined as the number of $Y$'s closer to $X(i)$ than to any other $X$ to their right. In the present paper, $S_i$ is defined as the number of $Y$'s in another neighborhood of $X(i)$. Our present definition of $S_i$ does not become for $k = 1$ the same as the definition of $S_i$ in [1]. Rather, in the univariate case, our present definition of $S_i$ is the number of $Y$'s lying within a distance $R_i$ on either side of $X(i)$. However, if $\lim_{m \rightarrow \infty, m/n = \alpha} P_{m, n}\lbrack S_1, = s_1, S_2 = s_2\rbrack$ is computed for the univariate case using the definition of $S_i$ given in [1], the only way in which it differs from $Q(s_1)Q(s_2)$ is that $\alpha$ is replaced by $\alpha/2$. Thus it seems reasonable to treat the $S_i$ as defined here as $k$-dimensional analogues of the $S_i$ as defined in [1], at least for large samples. An intuitive reason for $\alpha$ being replaced by $\alpha/2$ is that in our present case, $\sum^m_{i = 1} S_i$ may be less than $n$, whereas in [1] this sum must always equal $n$. Thus in our present case, we are in a sense discarding some of the $Y$'s, which lowers $n$ relative to $m$ and thus raises $\alpha$ by a certain factor (2, as it happens). In our present case, $\sum S_i$ may be less than $n$ because the $R_i$ are chosen to make the spheres around the $X$'s non-overlapping, thus simplifying the analysis. The $R_i$ were chosen to give the largest possible non-overlapping spheres because it would seem intuitively that the larger the spheres, the more rapid the approach of the probabilities to their limiting values.
Article information
Source
Ann. Math. Statist. Volume 31, Number 1 (1960), 159-164.
Dates
First available in Project Euclid: 27 April 2007
Permanent link to this document
http://projecteuclid.org/euclid.aoms/1177705995
Digital Object Identifier
doi:10.1214/aoms/1177705995
Mathematical Reviews number (MathSciNet)
MR119305
Zentralblatt MATH identifier
0092.36401
JSTOR
links.jstor.org
Citation
Weiss, Lionel. Two-Sample Tests for Multivariate Distributions. Ann. Math. Statist. 31 (1960), no. 1, 159--164. doi:10.1214/aoms/1177705995. http://projecteuclid.org/euclid.aoms/1177705995.

