Perfect simulation from the Quicksort limit distribution

The weak limit of the normalized number of comparisons needed by the Quicksort algorithm to sort n randomly permuted items is known to be determined implicitly by a distributional fixed-point equation. We give an algorithm for perfect random variate generation from this distribution.


Introduction
Let C n denote the number of key comparisons needed to sort a list of n randomly permuted items by Quicksort.It is known that IEC n = 2(n + 1)H n − 4n ∼ 2n ln n and Var C n ∼ (7 − (2π 2 /3))n 2 , where H n denotes the nth harmonic number.Furthermore, in distribution.This limit theorem was first obtained by Régnier [8] by an application of the martingale convergence theorem.Rösler [9] gave a different proof of this limit law via the contraction method.Rösler's approach identifies the distribution of X to be the unique solution with zero mean and finite variance of the distributional fixed-point equation where X (1) , X (2) , and U are independent; X (1) and X (2) are distributed as X; U is uniform [0, 1]; g is given by g(u) := 1 + 2u ln u + 2(1 − u) ln(1 − u); and D = denotes equality in distribution.
The limit random variable X has finite moments of every order which are computable from the fixed point equation (1).Tan and Hadjicostas [10] proved that X has a Lebesgue density.Not much else was known rigorously about this distribution until Fill and Janson recently derived some properties of the limiting density [5] and results about the rate of convergence of the law of X n to that of X [6].Some of these results are restated for the reader's convenience in the next section.
We develop an algorithm, based on the results of Fill and Janson, which returns a perfect sample of the limit random variable X.We assume that we have available an infinite sequence of i.i.d.uniform [0, 1] random variables.Our solution is based on a modified rejection method, where we use a convergent sequence of approximations for the density to decide the outcome of a rejection test.Such an approach was recently used by Devroye [3] to sample perfectly from perpetuities.

Properties of the quicksort density
Our rejection sampling algorithm is based on a simple upper bound and an approximation of (the unique continuous version of) the Quicksort limit density f .We use the following properties of f established in [5] and [6].Let F n denote the distribution function for X n .P1. f is bounded [5]: , where ĉ := (54cK 2 ) 1/3 , c := 589, we have [6] sup where By property P2, f is Lipschitz continuous with Lipschitz constant K. Therefore, Theorem 3.5 in Devroye [2, p. 320] implies the upper bound Here, F denotes the distribution function corresponding to f .Markov's inequality yields F (x) = IP(X ≤ x) ≤ (IEX 4 )/x 4 for all x < 0. Similarly, 1 − F (x) = IP(X > x) ≤ (IEX 4 )/x 4 for x > 0. The fourth moment of X can be derived explicitly in terms of the zeta function either by Hennequin's formula for the cumulants of X (this formula was conjectured in Hennequin [7] and proved later in his thesis) or through the fixed point equation (1).From (1), Cramer [1] computed IEX 4 = 0.7379 . . .(accurate to the indicated precision), so IEX 4 < 1.Therefore, if we define we have f ≤ g.The scaled version g := ξg is the density of a probability measure for ξ := 1/ g L 1 = [4K 1/2 (2 K) Remark.According to the results of [5], f enjoys superpolynomial decay at ±∞, so certainly f ≤ g for some g of the form g(x) := min(K, Cx −2 ).One way to obtain an explicit constant C is to use where φ is the characteristic function corresponding to f , and to bound |φ ′′ (t)| [e.g., by min(c 1 , c 2 t −2 ) for suitable constants c 1 , c 2 ] as explained in the proof of Theorem 2.9 in [5].But we find that our approach is just as straightforward, and gives a smaller value of C (although we have made no attempt to find the best C possible using the Fourier techniques of [5]).

The rejection algorithm
We have found an explicit, integrable upper bound on f .Furthermore, an approximation of f with explicit error estimate is given by P3.Let with δ n given in P3.Then |f n (x) − f (x)| ≤ R n for all x ∈ IR, and R n → 0 for n → ∞.
To calculate the values of f n we require knowledge about the probabilities of the events {C n = i}.Let N(n, i) denote the number of permutations of n distinct numbers for which Quicksort needs exactly i key comparisons to sort.Then These probabilities are non-zero only if n − 1 ≤ i ≤ n(n − 1)/2.With the initializing conventions N(0, 0) := 1 and N(i, 0) := 0 for i ≥ 1 and the obvious values N(n, i) = 0 for i < n − 1 and for i > n(n − 1)/2, we have the following recursion for n ≥ 1 and for n − 1 ≤ i ≤ n(n − 1)/2: This recurrence is well known.To verify it, assume that the first pivot element is the kth largest element out of n.Then the number of permutations leading to i key comparisons is the number N(k − 1, l) of permutations of the items less than the pivot element which are sorted with l key comparisons, multiplied by the corresponding number of permutations for the elements greater than the pivot element, summed over all possible values of k and l.Note that n−1 key comparisons are used for the splitting procedure.Observe that we also have and IEC n , can be computed from the previous tables (N(k, i) ).Then, observe that, for y < z, and thus f n (x) is computable from the table (N(n, i) ).Now, the following rejection algorithm gives a perfect sample X from the Quicksort limit distribution F : This algorithm halts with probability one, and produces a perfect sample from the Quicksort limit distribution.The expected number of outer loops is g = 134.1.Note, however, that the constants K and K are very crude upper bounds for f ∞ and f ′ ∞ , which from the results of numerical calculations reported in [10] appear to be on the order of 1 and 2, respectively.Moreover, considerable speed-up could be achieved for our algorithm by finding another approximation f n to f that either is faster to compute or is faster to converge to f (or both).One promising approach, on which we hope to report more fully in future work, is to let f 1 , f 2 , . . .be the densities one obtains, starting from a suitably nice density f 0 (say, standard normal), by applying the method of successive substitutions to (1).Indeed, Fill and Janson [6] show that then f n → f uniformly at an exponential rate.However, one difficulty is that these computations require repeated numerical integration, but it should be possible to bound the errors in the numerical integrations using calculations similar to those in [5].