## The Annals of Mathematical Statistics

### Tables for the Distribution of the Number of Exceedances

Benjamin Epstein

#### Abstract

Consider a random sample of size $n$ taken from a continuous distribution $f(x)$. Let another random sample, independent of the first sample and also of size $n$, be drawn from the same population. Let $U^n_r$ be the random variable associated with the number of values in the second sample which exceed the $r$th smallest value in the first sample. Similarly let $V^n_s$ be the random variable associated with the number of values in the second sample which exceed the $s$th largest value in the first sample. Due to the fact that the $r$th smallest value in a sample of size $n$ is at the same time the $s$th largest value in the sample with $s = n - r + 1$, it follows that \begin{equation*}\tag{1} \mathrm{Pr} (U^n_r = x) \equiv \mathrm{Pr} (V^n_s = x),\end{equation*} $s = n - r + 1; \quad r = 1,2, \cdots, n;\quad x = 0,1,2, \cdots, n.$ The probability distribution of $U^n_r$ (and hence of $V^n_s)$ is given by: \begin{equation*}\tag{2} \mathrm{Pr} (U^n_r = x) = \binom{n-x+r-1}{r-1}\binom{n-r+x}{x}\binom/{2n}{n} = \frac{1}{2}P_{n-x+r-1,r-1} P_{n-r+x,x}/P_{2n,n},\end{equation*} $x = 0, 1, 2, \cdots, n.$ Formula (2) can be proved by combinatorial methods; details are omitted. An alternative formula, derived in another way [3], is \begin{equation*}\tag{2a} \mathrm{Pr}(U^n_r = x) = \frac{1}{2}\binom{n-1}{r-1}\binom{n}{x}/\binom{2n-1}{n-r_x} = \frac{1}{2} P_{n-1,r-1}P_{n,x}/P_{2n-1,n-r+x}.\end{equation*} In formulae (2) and (2a), $P_{n,x} = (\frac{1}{2})^n\binom{n}{x}$. Formulae in terms of $P_{n,x}$ are particularly convenient for hand computation, since one can use the extensive tables of the binomial probability distribution published by the National Bureau of Standards. If the values of $\mathrm{Pr} (U^n_r \leqq x), \text{for} x = 0, 1, 2, \cdots, n - 1, r = 1,2, \cdots, n$ are written (for fixed $n$) in matrix form, one notes certain useful symmetries, which can be expressed by the identities \begin{align*}\tag{3} \mathrm{Pr} (U^n_r &\leqq x) = \mathrm{Pr} (U^n_{x+1} \leqq r - 1), \\ \tag{4} \mathrm{Pr} (U^n_r &\leqq x) + \mathrm{Pr} (U^n_{n-r+1} \leqq n - x - 1) = 1. \\ \end{align*} If one takes $x = n - r$ in (4) and uses the relation (3), it is readily verified that \begin{equation*}\tag{5} \mathrm{Pr} (U^n_r \leqq n - r) = \frac{1}{2}.\end{equation*} Proofs of (3), (4), and (5) can be obtained by using the results of pages 257-258 of [3]. Because of these symmetries, the complete matrix (for any fixed $r$) can be constructed if one knows only the quantities, $\mathrm{Pr} (U^n_r \leqq x), r = 1(1)\lbrack n/2\rbrack, x = r - 1, r, r + 1, \cdots, n - r - 1.$ In Table 1 these values are given for $n = 2(1)15(5)20.$ To see how the complete matrix is obtained from Table 1, it is interesting to verify, using (3), (4), and (5), that the complete matrix, in the special case $n = 5,$ is given by Table 2. A somewhat different, but related, exceedance problem is to take two random samples of size $n$ from a continuous distribution $f(x).$ Let us for convenience attach the letter $x$ to one of the samples and the letter $y$ to the other sample. Further let $x_{r,n}$ and $y_{r,n}$ be respectively the $r$th smallest observations in each of the samples. Let us define $Z_{r,n} = \max (x_{r,n}, y_{r,n}).$ If $Z_{r,n} = x_{r,n}$, count the number of $y$'s which are $\geqq x_{r,n};$ if $z_{r,n} = y_{r,n}$ count the number of $x$'s which are $\geqq y_{r,n}.$ Denoting the number of exceedances as $W^n_r,$ it is readily seen from (1) that the probability distribution of $W^n_r$ is given by \begin{equation*}\tag{6} \mathrm{Pr}(W^n_r = x) = 2\binom{n-z+r-1}{r-1}\binom{n-r+x}{x}/\binom{2n}{n},\quad x = 0,1,2,\cdots, n - r.\end{equation*} It is evident from the definition that, \begin{equation*}\tag{7} \mathrm{Pr}(W^n_r \leqq x) = 1,\quad x \geqq n - r.\end{equation*} Clearly one can find the values of $\mathrm{Pr}(W^n_r \leqq x)$ by using Table 1. Thus, for example, in the special case $n = 5$ one obtains Table 3.

#### Article information

Source
Ann. Math. Statist., Volume 25, Number 4 (1954), 762-768.

Dates
First available in Project Euclid: 28 April 2007

https://projecteuclid.org/euclid.aoms/1177728662

Digital Object Identifier
doi:10.1214/aoms/1177728662

Mathematical Reviews number (MathSciNet)
MR65074

Zentralblatt MATH identifier
0056.37503

JSTOR