A characterization of the set of fixed points of the Quicksort transformation

The limiting distribution \mu of the normalized number of key comparisons required by the Quicksort sorting algorithm is known to be the unique fixed point of a certain distributional transformation T -- unique, that is, subject to the constraints of zero mean and finite variance. We show that a distribution is a fixed point of T if and only if it is the convolution of \mu with a Cauchy distribution of arbitrary center and scale. In particular, therefore, \mu is the unique fixed point of T having zero mean.


Introduction, motivation, and summary
Let M denote the class of all probability measures (distributions) on the real line. This paper concerns the transformation T defined on M by letting T ν be the distribution of where U , Z, and Z * are independent, with Z ∼ ν, Z * ∼ ν, and U ∼ unif(0, 1), and where g(u) := 2u ln u + 2(1 − u) ln(1 − u) + 1. It is well known [7] that (i) among distributions with zero mean and finite variance, T has a unique fixed point, call it µ; and (ii) if C n denotes the random number of key comparisons required by the algorithm Quicksort to sort a file of n records, then the distribution of (C n − EC n )/n converges weakly to µ.
There are other fixed points. For example, it has been noted frequently in the literature that the location family generated by µ is a family of fixed points. But there are many more fixed points, as we now describe. Define the Cauchy(m, σ) distribution (where m ∈ R and σ ≥ 0) to be the distribution of m + σC, where C has the standard Cauchy distribution with density x → [π(1 + x 2 )] −1 , x ∈ R; equivalently, Cauchy(m, σ) is the distribution with characteristic function e imt−σ|t| . [In particular, the Cauchy(m, 0) distribution is unit mass at m.] Now let F denote the class of all fixed points of T , and let C denote the class of convolutions of µ with a Cauchy distribution. Using characteristic functions it is easy to check that C ⊆ F, and that all of the distributions in C are distinct. In this paper we will prove that, conversely, F ⊆ C, and thereby establish the following main result.
The following corollary is immediate and strengthens Rösler's [7] characterization of µ as the unique element of F having zero mean and finite variance. The present paper can be motivated in two ways. First, the authors are writing a series of papers refining and extending Rösler's [7] probabilistic analysis of Quicksort. No closed-form expressions are known for any of the standard functionals (e.g., characteristic function, distribution function, density function) associated with µ; information to be obtained about µ must be read from the fixed-point identity it satisfies. We were curious as to what extent additional known information about µ, such as the fact that it has everywhere finite moment generating function, must be brought to bear. As one example, it is believed that the continuous Lebesgue density f (treated in [3]) for µ decays at least exponentially quickly to 0 at ±∞, cf. [5]. But we now know from Theorem 1.1 that there there can be no proof for this conjecture that solely makes use of the information that µ ∈ F.
Second, we view the present paper as a pilot study of fixed points for a class of distributional transformations on the line. In the more general setting, we would be given (the joint distribution of) a sequence (A i : i ≥ 0) of random variables and would define a transformation T on M by letting T ν be the distribution of . . are independent random variables with distribution ν. [To ensure welldefinedness, one might (for example) require that (almost surely) A i = 0 for only finitely many values of i.] For probability measures ν on [0, ∞), rather than on R, and with the additional restrictions that A 0 = 0 and A i ≥ 0 for all i ≥ 1, such transformations are called generalized smoothing transformations. These have been thoroughly studied by Durrett and Liggett [2], Guivarc'h [4], and Liu [6], and by other authors; consult the three papers we have cited here for further bibliographic references. Generalized smoothing transformations have applications to interacting particle systems, branching processes and branching random walk, random set constructions, and statistical turbulence theory. The arguments used to characterize the set of fixed points for generalized smoothing transformations make heavy use of Laplace transforms; unfortunately, these arguments do not carry over readily to distributions on the line. Other authors (see, e.g., [8] [9] [10]) have treated fixed points of transformations of measures ν on the whole line as discussed above, but not without finiteness conditions on the moments of ν.
We now outline our proof of Theorem 1.1. Let ψ be the characteristic function of a given ν ∈ F, and let r(t) := ψ(t) − 1, t ∈ R. In Section 2 we establish and solve (in a certain sense) an integral equation satisfied by r. In Section 3 we then use the method of successive substitutions to derive asymptotic information about r(t) as t ↓ 0, showing first that r(t) = O(t 2/3 ), next that r(t) = βt + o(t) for some β = −σ + im ∈ C with σ ≥ 0, and finally that r(t) = βt + O(t 2 ) . In Section 4 we use this information to argue that there exist random variables Z 1 ∼ ν, Z 2 ∼ µ, and C ∼ Cauchy(m, σ) such that Z 1 = Z 2 + C. We finish the proof by showing that one can take Z 2 and C to be independent, whence ν ∈ C.

An integral equation
Let ψ denote the characteristic function of a given ν ∈ F. Since ψ(−t) ≡ ψ(t), we shall only need to consider ψ(t) for t ≥ 0. For notational convenience, define Rearranging the fixed-point integral equation (T ψ)(t) ≡ ψ(t), we obtain the following result.
Lemma 2.1. The function r satisfies the integral equation (2.2) Note that r and b are continuous on [0, ∞), with r(0) = 0 = b(0). Regarding b as "known", the integral equation in Lemma 2.1 is easily "solved" for r: Thus h is continuously differentiable on (0, ∞) and satisfies the differential equation there. This is an easy differential equation to solve for h, and we find that for some c ∈ C. After rearrangement, the proposition is proved.
3 Behavior of r near 0 We now proceed in stages, using Proposition 2.2 as our basic tool, to get ever more information about the behavior of r (especially near 0).  From (2.1) and (2.2), we see immediately that, for 0 < t ≤ 1, Therefore, for 0 < t < 1, Proposition 2.2 yields where Consequently, again for 0 < t < 1 (but then trivially for all t ≥ 0), Fix 0 < a < 1; later in the proof we shall see that a = 1/8 suffices for our purposes.
and thus Since M is bounded, this is trivially true also for t > t 0 . Summarizing, for some constant C < ∞ we have, with U ∼ unif(0, 1), Now fix the value of a to be any number in (0, 1/7), say a = 1/8. Then a straightforward induction [substituting (3.2) into (3.1) for the induction step] shows that for any nonnegative integer n we have, for all t ≥ 0, Recalling that M is bounded and letting n → ∞, we obtain the desired conclusion, with C := 1−a 1−7aC .
Lemma 3.2. Let ψ ≡ 1 + r denote the characteristic function of a given ν ∈ F, and define b by (2.1). Then where J is the absolutely convergent integral Proof. Combining (2.1)-(2.2) and Lemma 3.1, we obtain Thus the integral J converges absolutely, and from Proposition 2.2 we obtain the desired conclusion about r.
Lemma 3.2 is all we will need in the next section, but the following refinement follows readily and takes us as far as we can go with the method of successive substitutions.
for some β = im − σ ∈ C, with m ∈ R and σ ≥ 0. Let ν be the corresponding probability measure. Then T n 0 ν =⇒ Cauchy(m, σ), where T 0 is the homogeneous analogue of the Quicksort transformation T mapping distributions as follows (in obvious notation): Proof. Let Z 1 , Z 2 , . . . ; U 1 , U 2 , . . . be independent random variables, with every Z i ∼ ν and every U j ∼ unif(0, 1). Then, using the definition of T 0 repeatedly, where we define the random variables V (n) i as follows. Using U 1 in the obvious fashion, split the unit interval into intervals of lengths U 1 and 1− U 1 . Now using U 2 and U 3 , split the first interval into subintervals of lengths U 1 U 2 and U 1 (1−U 2 ) and the second interval into subintervals of lengths (1 − U 1 )U 3 and (1 − U 1 )(1 − U 3 ). Continue in this way (using U 1 , . . . , U 2 n −1 ) until the unit interval has been divided overall into 2 n subintervals. Call their lengths, from left to right, V 2 n ). We show that L n converges in probability to 0 as n → ∞. Luckily, the complicated dependence structure of the variables V (n) i does not come into play; the only observation we need is that that each V (n) i marginally has the same distribution as U 1 · · · U n . Indeed, abbreviate V (n) 1 as V n ; briefly put, we derive a Chernoff's bound for ln(1/V n ) and then simply use subadditivity. To spell things out, let x > 0 be fixed and let t ≥ 0. Then Choosing the optimal t = n x − 1 (valid for n ≥ x), this yields P(V n ≥ e −x ) ≤ exp[−(n ln(n/x) − n + x)] = exp[−(n(ln n − ln(ex)) + x)] and thus P(L n ≥ e −x ) ≤ 2 n exp[−(n(ln n − ln(ex)) + x)] = exp[−(n(ln n − ln(2ex)) + x)] → 0 as n → ∞.
Since L n converges in probability to 0, we can therefore choose ǫ n → 0 so that P(L n > ǫ n ) → 0. To prove the theorem, it then suffices to prove W n := 1(L n ≤ ǫ n )W n =⇒ Cauchy(m, σ).
For this, we note that the characteristic function φ n of W n is given for t ∈ R by φ n (t) = P(L n > ǫ n ) + E 1(L n ≤ ǫ n ) We will show that φ n (t) converges to e βt = e imt−σt for each fixed t ≥ 0, and [since, further, φ n (−t) ≡ φ n (t)] this will complete the proof of the lemma.
Indeed, we need only consider the second term in (4.3). For that, the calculus estimates outlined in the proof of the lemma preceding Theorem 7.1.2 in [1] demonstrate that, when L n ≤ ǫ n , for complex random variables D n (depending on our fixed choice of t ≥ 0) satisfying |D n | ≤ δ n for a deterministic sequence δ n [ ≡ δ(ǫ n t)] → 0 [with δ(s) → 0 as s → 0].
[Leaving out the error estimates, the argument is It now follows easily that φ n (t) → e βt , as desired.
Both the next lemma and its immediate corollary (Lemma 4.3) will be used in our proof of Theorem 1.1.
Proof. Extend T to a transformation T 2 on the class M 2 of probability measures on R 2 by mapping the distribution ξ ∈ M 2 of (X, Y ) to the distribution T 2 ξ of where U , (X, Y ), and (X * , Y * ) are independent, with (X, Y ) ∼ ξ, (X * , Y * ) ∼ ξ, and U ∼ unif(0, 1), and where g is given by (1.1). (Note that we use the same uniform U for the Y s as for the Xs!) Of course, T 2 maps the marginal distributions ξ 1 (·) = ξ(· × R) of X and ξ 2 (·) = ξ(R × ·) of Y into T ξ 1 and T ξ 2 , respectively; more importantly for our purposes, it maps the distribution, call itξ, of X − Y into the distribution T 0ξ , with T 0 defined at (4.2). Now let ν ∈ M 2 have marginals ν i , i = 1, 2. Then (T n 2 ν) n≥1 has constant marginals (ν 1 , ν 2 ) as n varies and so is a tight sequence. We then can find a weakly convergent subsequence, say, T n k 2 ν =⇒ ν ∞ ∈ M 2 ; of course, the limit ν ∞ again has marginals ν i , i = 1, 2. Moreover, But, by supposition, the characteristic function ofν satisfies (4.1), so Theorem 4.1 implies that ν ∞ is Cauchy(m, σ). Thus ν ∞ ∈ M 2 supplies the desired coupling.

The proof
We now complete the proof of Theorem 1.1.
Proof. As discussed in Section 1, it is simple to check that C ⊆ F (and that the elements of C are all distinct).
We know that the distribution ν 1 ofZ 1 =Z 2 + C is a fixed point of T . But so is the distribution ν ′ 1 ∈ C of Z := Y + C. By Lemma 4.3 applied to (Z 1 , Z), ν = ν 1 = ν ′ 1 ∈ C, as desired.