On the Coupon Collector's Waiting Time

Bengt Rosen

doi:10.1214/aoms/1177696696

December, 1970 On the Coupon Collector's Waiting Time

Bengt Rosen

Ann. Math. Statist. 41(6): 1952-1969 (December, 1970). DOI: 10.1214/aoms/1177696696

Abstract

We shall introduce a set of random variables and give interpretations of them in terms of coupon collection. A person collects coupons with different colors. Let there be in all $N$ different colors, which we label $1,2, \cdots, N$. The different colors may occur with different frequencies. The colors of successive coupons are independent. Let $J_\nu$ be the color of the $\nu$th coupon. Our formal assumptions are: $J_1, J_2, \cdots$ are independent random variables, all with the following distribution \begin{equation*}\tag{1.1} P(J = s) = p_s,\quad s = 1,2, \cdots, N\end{equation*} where \begin{equation*}\tag{1.2} p_s \geqq 0,\quad p_1 + p_2 + \cdots + p_N = 1.\end{equation*} Thus, $p_s$ is the probability that a coupon has color $s$. Let \begin{equation*}\tag{1.3} M_n = {\tt\#} \text{different elements among} (J_1, J_2, \cdots, J_n),\quad n = 1,2, \cdots.\end{equation*} Thus, $M_n$ is the number of different colors in the collection after $n$ coupons. Let \begin{equation*}\tag{1.4} T_n = \min \{\nu: M_\nu = n\},\quad n = 1,2, \cdots, N.\end{equation*} $T_n$ is the number of coupons needed in order to get a collection with $n$ different colors in. Define \begin{align*} \tag{1.5} D_\nu &= 1\quad\text{if} J_\nu \notin (J_1, J_2, \cdots, J_{\nu-1}),\quad \nu = 1,2, \cdots \\ &= 0\quad \text{otherwise}.\end{align*} Thus, $D_\nu$ tells if the $\nu$th coupon adds a new color to the collection or not. We shall assume that the coupons also carry a bonus value, which is a real number. All coupons with the same color have the same bonus value, while the bonus value may differ from color to color. Let $a_s$ be the bonus value of coupons with color $s, s = 1,2, \cdots, N$. The bonus sum of a collection of coupons is obtained by adding the bonus values of the different colors which are represented in the collection. Thus, duplicates do not count. Formally we define the bonus sum as follows. \begin{equation*}\tag{1.6} Q_n = a_{J_1}D_1 + a_{J_2}D_2 + \cdots + a_{J_n}D_n,\quad n = 1,2, \cdots.\end{equation*} The random variable $Q_n$ will be referred to as the bonus sum after $n$ coupons for a collector in the situation $\Omega = ((p_1, a_1), (p_2, a_2), \cdots, (p_N, a_N))$. We define for $B > 0$ \begin{equation*}\tag{1.7} W(B) = \min \{n: Q_n \geqq B\}.\end{equation*} $W(B)$ will be referred to as the waiting time to obtain bonus sum $B$ for a coupon collector in the situation $\Omega$. The following lemma, which is obvious, states that we have introduced a slight abundance of terminology and notation. LEMMA 1.1. The random variables $M_n$ and $T_n$ in (1.3) and (1.4) are respectively the bonus sum after $n$ coupons and the waiting time to obtain bonus sum $n$ for a coupon collector in the situation $((p_1), (p_2, 1), \cdots, (p_n, 1))$. Our main concern will be to study the random variable $W(B)$ and its particular case $T_n$. We confine ourselves to the case when all bonus values, $a_s$, are positive. The main result is that $W(B)$, under general conditions, is asymptotically (as $n$ and $N$ increase simultaneously) normally distributed. We give a brief sketch of the idea of proof, which is well known. When all $a$'s are positive, the distributions of the random variables $W(B)$ and $Q_n$ are related according to the formula \begin{equation*}\tag{1.8} P(W(B) > x) = P(Q_{\lbrack x \rbrack} < B),\quad x, B > 0.\end{equation*} With the aid of formula (1.8) one can "invert" results concerning either of the random variables $Q$ or $W$ to yield results concerning the other variable. In [5] we showed that $Q_n$, under general conditions, is asymptotically normally distributed. The asymptotic normality of $W$ will be derived by inversion of the results in [5]. The asymptotic behavior of the collector's waiting time has, to the best of our knowledge, earlier only been considered in the classical case, i.e. $p_s = 1/N$ and $a_s = 1, s = 1,2, \cdots, N$. In [4] Section 3, Renyi derives results about $M$ by first deriving results about $T$ and then "inverting." His basic tool is the representation \begin{equation*}\tag{1.9} T_n = U_1 + U_2 + \cdots + U_n\end{equation*} where $U_v$ is the waiting time from bonus sum $\nu - 1$ to bonus sum $\nu$. In the classical case $U_1, U_2, \cdots$ are independent random variables and $P(U_\nu = k) = ((\nu - 1)/N)^{k-1}(N - \nu + 1)/N, k = 1,2, \cdots$. Thus, results concerning the asymptotic behavior of sums of independent random variables can be applied. A complete investigation along these lines is given by Baum and Billingsley in [1]. A generalized version of the problem is considered by Ivchenko and Medvedev in [2]. In their problem, as in our problem here, a representation of the type (1.9) no longer holds. They proceed along the path we shall follow here, i.e. to obtain results about the waiting time by "inverting" results concerning the bonus sum. The following notation will be used throughout the paper. $E$ and $\sigma^2$ stand for expectation and variance. $^c$ denotes centering at expectation, i.e. $X^c = X - EX. X = _\mathscr{L} Y$ means that the random variables $X$ and $Y$ have the same distribution. $\Rightarrow$ denotes convergence in law. The normal distribution with mean $\mu$ and variance $\sigma^2$ is denoted by $N(\mu, \sigma^2)$. The integral part of a real number is denoted by \lbrack \rbrack.

Citation

Download Citation

Bengt Rosen. "On the Coupon Collector's Waiting Time." Ann. Math. Statist. 41 (6) 1952 - 1969, December, 1970. https://doi.org/10.1214/aoms/1177696696