The number of bit comparisons used by Quicksort: an average-case analysis

The analyses of many algorithms and data structures (such as digital search trees) for searching and sorting are based on the representation of the keys involved as bit strings and so count the number of bit comparisons. On the other hand, the standard analyses of many other algorithms (such as Quicksort) are performed in terms of the number of key comparisons. We introduce the prospect of a fair comparison between algorithms of the two types by providing an average-case analysis of the number of bit comparisons required by Quicksort. Counting bit comparisons rather than key comparisons introduces an extra logarithmic factor to the asymptotic average total. We also provide a new algorithm,"BitsQuick", that reduces this factor to constant order by eliminating needless bit comparisons.


Introduction and summary
Algorithms for sorting and searching (together with their accompanying analyses) generally fall into one of two categories: either the algorithm is regarded as comparing items pairwise irrespective of their internal structure (and so the analysis focuses on the number of comparisons), or else it is recognized that the items (typically numbers) are represented as bit strings and that the algorithm operates on the individual bits.Typical examples of the two types are Quicksort and digital search trees, respectively; see [15].
In this paper-a substantial expansion of the extended abstract [7]-we take a first step towards bridging the gap between the two points of view, in order to facilitate run-time comparisons across the gap, by answering the following question posed many years ago by Bob Sedgewick [personal communication]: What is the bit complexity of Quicksort?(For a discussion of related work that has transpired in the time between [7] and this paper, see Remark 1.6 at the end of this section.) More precisely, we consider Quicksort (see Section 2 for a review) applied to n distinct keys (numbers) from the interval (0, 1).Many authors (Knuth [15], Régnier [19], Rösler [21], Knessl and Szpankowski [14], Fill and Janson [5] [6], Neininger and Ruschendorff [18], and others) have studied K n , the (random) number of key comparisons performed by the algorithm.This is a natural measure of the cost (run-time) of the algorithm, if each comparison has the same cost.On the other hand, if comparisons are done by scanning the bit representations of the numbers, comparing their bits one by one, then the cost of comparing two keys is determined by the number of bits compared until a difference is found.We call this number the number of bit comparisons for the key comparison, and let B n denote the total number of bit comparisons when n keys are sorted by Quicksort.
We assume that the keys X 1 , . . ., X n to be sorted are independent random variables with a common continuous distribution F over (0, 1).It is well known that the distribution of the number K n of key comparisons does not depend on F .This invariance clearly fails to extend to the number B n of bit comparisons, and so we need to specify F .
For simplicity, we study mainly the case that F is the uniform distribution, and, throughout, the reader should assume this as the default.But we also give a result valid for a general absolutely continuous distribution F over (0, 1) (subject to a mild integrability condition on the density).
In this paper we focus on the mean of B n .One of our main results is the following Theorem 1.1, the concise version of which is the asymptotic equivalence Throughout, we use ln (respectively, lg) to denote natural (resp., binary) logarithm, and use log when the base doesn't matter (for example, in remainder estimates).The symbol .
= is used to denote approximate equality, and γ .= 0.57722 is Euler's constant.
Theorem 1.1.If the keys X 1 , . . ., X n are independent and uniformly distributed on (0, 1), then the number B n of bit comparisons required to sort these keys using Quicksort has expectation given by the following exact and asymptotic expressions: where, with β := 2π/ ln 2, is periodic in lg n with period 1 and amplitude smaller than 5 × 10 −9 .
Small periodic fluctuations as in Theorem 1.1 come as a surprise to newcomers to the analysis of algorithms but in fact are quite common in the analysis of digital structures and algorithms; see, for example, Chapter 6 in [16].
For our further results, it is technically convenient to assume that the number of keys is no longer fixed at n, but rather Poisson distributed with mean λ and independent of the values of the keys.(In this paper, we shall not deal with the "de-Poissonization" that would be needed to transfer results back to the fixed-n model.)In obvious notation, the Poissonized version of (1.1)- with π λ as in (1.3).The exact formula follows immediately from (1.1), and the asymptotic formula is established in Section 5 as Proposition 5.4.We will also see (Proposition 5.6) that Var B(λ) = O(λ 2 ), so B(λ) is concentrated about its mean.Since the number K(λ) of key comparisons is likewise concentrated about its mean E K(λ) ∼ 2λ ln λ for large λ (see Lemmas 5.1 and 5.3), it follows that In other words, about 1 2 lg λ bits are compared per key comparison.Remark 1.2.Further terms can be obtained in (1.2) and (1.5) by the methods used in the proofs below.In particular, the O(log λ) in (1.5) can be refined to for any fixed M , with c 4 := 4 ln 2 + 2 + 2γ .= 5.927.
For non-uniform distribution F , we have the same leading term for the asymptotic expansion of E B(λ), but the second-order term is larger.(Throughout, ln + denotes the positive part of the natural logarithm function.We denote the uniform distribution by unif.)Theorem 1.3.Let X 1 , X 2 , . . .be independent with a common distribution F over (0, 1) having density f , and let N be independent and Poisson with mean λ.If 1 0 f (ln + f ) 4 < ∞, then the expected number of bit comparisons, call it µ f (λ), required to sort the keys X 1 , . . ., X N using Quicksort satisfies as λ → ∞, where H(f ) := 1 0 f lg f ≥ 0 is the entropy (in bits) of the density f .In applications, it may be unrealistic to assume that a specific density f is known.Nevertheless, even in such cases, Theorem 1.3 may be useful since it provides a measure of the robustness of the asymptotic estimate in Theorem 1.1.
Bob Sedgewick (among others who heard us speak on the material of this paper) suggested that the number of bit comparisons for Quicksort might be reduced substantially by not comparing bits that have to be equal according to the results of earlier steps in the algorithm.In the final section (Theorem 7.1), we note that this is indeed the case: For a fixed number n of keys, the average number of bit comparisons in the improved algorithm (which we dub "BitsQuick") is asymptotically equivalent to 2(1 + 3 2 ln 2 )n ln n, only a constant ( .= 3.2) times the average number of key comparisons [see (2.2)].A related algorithm is the digital version of Quicksort by Roura [22]; it too requires Θ(n log n) bit comparisons (we do not know the exact constant factor).
We may compare our results to those obtained for radix-based methods, for example radix exchange sorting, see [15,Section 5.2.2].This method works by bit inspections, that is, by comparisons to constant bits, rather than by pairwise comparisons.In the case of n uniformly distributed keys, radix exchange sorting uses asymptotically n lg n bit inspections.Since radix exchange sorting is designed so that the number of bit inspections is minimal, it is not surprising that our results show that Quicksort uses more bit comparisons.More precisely, Theorem 1.1 shows that Quicksort uses about ln n times as many bit comparisons as radix exchange sorting.For BitsQuick, this is reduced to a small constant factor.This gives us a measure of the cost in bit comparisons of using these algorithms; Quicksort is often used because of other advantages, and our results open the possibility of seeing when they outweigh the increase in bit comparisons.
In Section 2 we review Quicksort itself and basic facts about the number K n of key comparisons.In Section 3 we derive the exact formula (1.1) for E B n , and in Section 4 we derive the asymptotic expansion (1.2) from an alternative exact formula that is somewhat less elementary than (1.1) but much more transparent for asymptotics.In the transitional Section 5 we establish certain basic facts about the moments of K(λ) and B(λ) in the Poisson case with uniformly distributed keys, and in Section 6 we use martingale arguments to establish Theorem 1.3 for the expected number of bit comparisons for Poisson(λ) draws from a general density f .Finally, in Section 7 we study the improved BitsQuick algorithm discussed in the preceding paragraph.
Remark 1.4.The results can be generalized to bases other than 2. For example, base 256 would give corresponding results on the "byte complexity".Remark 1.5.Cutting off and sorting small subfiles differently would affect the results in Theorems 1.1 and 1.3 by O(n log n) and O(λ log λ) only.In particular, the leading terms would remain the same.
Remark 1.6.In comparison with the extended abstract [7], new in this expanded treatment are Remark 5.2, Propositions 5.4 and 5.7, and Lemma 6.2, together with complete proofs of Theorem 1.3, Lemmas 5.1 and 5.3, and Remark 6.3.Section 7 has been substantially revised.
In the time between [7] and the present paper, the following developments have occurred: • Fill and Nakama [8] followed the same sort of approach as in this paper to obtain certain exact and asymptotic expressions for the number of bit comparisons required by Quickselect, a close cousin of Quicksort.• Vallée et al. [23] used analytic-combinatorial methods to extend the results of [7] and [8] by deriving asymptotic expressions for the expected number of symbol comparisons for both Quicksort and Quickselect.In their work, as in the present paper, the keys are assumed to be independent and identically distributed, but the authors allow for quite general probabilistic models (also known as "sources") for how each key is generated as a symbol string.• Fill and Nakama [9] (see also [17]) obtained, for quite general sources, a limiting distribution for the (suitably scale-normalized) number of symbol comparisons required by Quickselect.
• Fill [4] obtained, for quite general sources, a limiting distribution for the (suitably center-and-scale-normalized) number of symbol comparisons required by Quicksort.
We were motivated to expand [7] to the present full-length paper in large part because this paper's Lemmas 5.1 and 5.3, and an extension of (the proof of) Proposition 5.7, play key roles in [4].

Review: number of key comparisons used by Quicksort
In this section we briefly review certain basic known results concerning the number K n of key comparisons required by Quicksort for a fixed number n of keys uniformly distributed on (0, 1).(See, for example, [6] and the references therein for further details.) Quicksort, invented by Hoare [13], is the standard sorting procedure in Unix systems, and has been cited [3] as one of the ten algorithms "with the greatest influence on the development and practice of science and engineering in the 20th century."The Quicksort algorithm for sorting an array of n distinct keys is very simple to describe.If n = 0 or n = 1, there is nothing to do.If n ≥ 2, pick a key uniformly at random from the given array and call it the "pivot".Compare the other keys to the pivot to partition the remaining keys into two subarrays.Then recursively invoke Quicksort on each of the two subarrays.
With K 0 := 0 as initial condition, K n satisfies the distributional recurrence relation = denotes equality in law (i.e., in distribution), and where, on the right, U n is distributed uniformly over the set {1, . . ., n}, K * j L = K j for every j, and Passing to expectations we obtain the "divide-and-conquer" recurrence relation which is easily solved to give It is also routine to use a recurrence to compute explicitly the exact variance of K n .In particular, the asymptotics are where σ 2 := 7 − 2 3 π 2 .= 0.4203.Higher moments can be handled similarly.Further, the normalized sequence converges in distribution, with convergence of moments of each order, to K, where the law of K is characterized as the unique distribution over the real line with vanishing mean that satisfies a certain distributional identity; and the moment generating functions of K n converge pointwise to that of K.

Exact mean number of bit comparisons
In this section we establish the exact formula (1.1), repeated here for convenience as (3.1), for the expected number of bit comparisons required by Quicksort for a fixed number n of keys uniformly distributed on (0, 1): ] .
Let X 1 , . . ., X n denote the keys, and X (1) < • • • < X (n) their order statistics.Consider ranks 1 ≤ i < j ≤ n.Formula (3.1) follows readily from the following three facts, all either obvious or very well known: • The event C ij := {keys X (i) and X (j) are compared} and the random vector (X (i) , X (j) ) are independent.• P(C ij ) = 2/(j − i + 1).[Indeed, C ij equals the event that the first pivot chosen from among X (i) , . . ., X (j) is either X (i) or X (j) .]• The joint density g n,i,j of (X (i) , X (j) ) is given by Let b(x, y) denote the index of the first bit at which the numbers x, y ∈ (0, 1) differ.(For definiteness we take in this paper the terminating expansion with infinitely many zeros for dyadic rationals in [0, 1), but 1 = .111 . . ..)Then By a routine calculation, which depends on x and y only through the difference y − x.Plugging (3.4) into (3.3),we find But, by routine (if somewhat lengthy) calculation, .
This now leads immediately to the desired (3.1).

Asymptotic mean number of bit comparisons
Formula (1.1), repeated at (3.1), is hardly suitable for numerical calculations or asymptotic treatment, due to excessive cancellations in the alternating sum.Indeed, if (say) n = 100, then the terms (including the factor 2, for definiteness) alternate in sign, with magnitude as large as 10 25 , and yet E B n .= 2295.Fortunately, there is a standard complex-analytic technique designed for precisely our situation (alternating binomial sums), namely, Rice's method.We will not review the idea behind the method here, but rather refer the reader to (for example) Section 6.4 of [16].Let and let B(z, w) := Γ(z)Γ(w)/Γ(z + w) denote the (meromorphic continuation) of the classical beta function.According to Rice's method, E B n equals the sum of the residues of the function B(n + 1, −z)h(z) at • the triple pole at z = 1; • the simple poles at z = 1 + iβk, for k ∈ Z \ {0}; • the double pole at z = 0.
The residues are easily calculated, especially with the aid of such symbolic-manipulation software as Mathematica or Maple.Corresponding to the above list, the residues equal where H n .Summing the residue contributions gives an alternative exact formula for E B n , from which the asymptotic expansion (1.2) (as well as higher-order terms) can be read off easily using standard asymptotics for H this approach, we obtain first the following analogue of (3.1): Then the residue contributions using Rice's method are • 2n(H n − 2 − 1 n ), at the double pole at z = 1; • 2(H n + 1), at the double pole at z = 0.
Summing the two contributions gives an alternative derivation of (2.1).

Poissonized model for uniform draws
As a warm-up for Section 6, we now suppose that the number of keys (throughout this section still assumed to be uniformly distributed) is Poisson with mean λ.
5.1.Key comparisons.We begin with a lemma which provides both the analogue of (2.1)-(2.2) and two other facts we will need in Section 6. Asymptotically, as λ → ∞ we have and as λ → 0 we have Comparing the n → ∞ expansion (2.2) with the corresponding expansion for Poisson(λ) many keys, note the difference in constant terms and the much smaller error term in the Poisson case.
Proof.To obtain the exact formula, begin with To derive the result for λ → ∞, letting 1[A] denote 1 if A holds and 0 otherwise, we observe Remark 5.2.The error term in (5.1) can, using Lemma A.2, be refined to an asymptotic expansion.Indeed, for any M ≥ 1 it can be written as To handle the number of bit comparisons, we will also need the following bounds on the moments of K(λ).Together with Lemma 5.1, these bounds also establish concentration of K(λ) about its mean when λ is large.For real 1 ≤ p < ∞, we let W p := (E |W | p ) 1/p denote L p -norm and use E(W ; A) as shorthand for the expectation of the product of W and the indicator of the event A.
Lemma 5.3.For every real p ≥ 1, there exists a constant c p < ∞ such that In particular, VarK(λ) ≤ c 2 2 λ 2 for all λ > 0. Proof.We use the notation of Theorem 1.3 with F uniform [so that K(λ) = K N with N distributed Poisson(λ)] and write κ n := E K n for n ≥ 0.
(a) The first result is certainly true for λ ≥ 1 bounded away from ∞.For λ → ∞ the result can be established by Poissonizing standard Quicksort moment calculations, as we now sketch.(Although the following argument is valid for all p ≥ 1, the reader that so prefers may assume that p is an even integer.)We start with (5.7) and proceed to argue that the first term on the right is asymptotically linear in λ while the second term is o(λ).
To handle the first term, observe that by the comments at the very end of Section 2 this equals (1 + o( 1)) E | K| p n p as n → ∞ and so can be bounded for all n by a constant times n p .Thus one need only observe that E N p = (1 + o(1))λ p as λ → ∞ to complete treatment of the first term on the right in (5.7).
To treat the second term in RHS(5.7) as λ → ∞, one can show using (2.2) and (5.1) and the normal approximation to the Poisson that where Z has the standard normal distribution.We omit the details.
(b) For λ ≤ 1 we use provided c p is taken to be at least the finite value

Bit comparisons.
We now turn our attention from K(λ) to the more interesting random variable B(λ), the total number of bit comparisons.We discuss first asymptotics for the mean µ unif (λ) and then the variability of B(λ) about the mean.
In our next proposition we will derive the asymptotic estimate (1.5) by applying standard asymptotic techniques to the exact formula (1.4).
Proposition 5.4.Asymptotically as λ → ∞, we have Proof (outline).Recalling (1.4) and noting that for x > 0 we have it follows that µ(λ) ≡ µ unif (λ) has the harmonic sum form rendering it amenable to treatment by Mellin transforms, see, e.g., [10] or [11].Indeed, it follows immediately that the Mellin transform µ * of µ is given for s in the fundamental strip {s ∈ C : −2 < Re s < −1} by in terms of the Mellin transform g * of g and the generalized Dirichlet series But it's also easy to check using the integral formula for g that g * (s) = Γ(s) (s + 1)s , and so ) .
The desired asymptotic expansion for µ(λ) (including the remainder term) can then be read off from the singular behavior of µ * (s) at its poles located at s = −1 (triple pole), s = −1 − iβk for k ∈ Z \ {0} (simple poles), and s = 0 (double pole), paralleling the use of Rice's method for E B n in Section 4.
In order to move beyond the mean of B(λ), we define to be the jth dyadic rational interval of rank k, and consider B k (λ) := number of comparisons of (k + 1)st bits, B k,j (λ) := number of comparisons of (k + 1)st bits between keys in I k,j .
Observe that (5.8) A simplification provided by our Poissonization is that, for each fixed k, the variables B k,j (λ) are independent.Further, the marginal distribution of B k,j (λ) is simply that of K(2 −k λ).
Remark 5.5.Taking expectations in (5.8), we find (5.9) If one is satisfied with a remainder of O(λ) rather than O(log λ), then Proposition 5.4 can also be proved by means of (5.9).This is done by splitting the sum ∞ k=0 there into ⌊lg λ⌋ k=0 and ∞ k=⌊lg λ⌋+1 and utilizing (5.1) (to the needed order) for the first sum and (5.2) [or rather the simpler E K(λ) = O(λ 2 ) as λ → 0] for the second.We omit the details.(See also Section 6 where this argument is used in a more general situation as part of the proof of Theorem 1.3.)Moreover, we are now in position to establish the concentration of B(λ) about µ unif (λ) promised just prior to (1.6).
Proof.For 0 < λ < ∞, we have by (5.8), the triangle inequality for • 2 , independence and B k,j (λ) Our next proposition extends the previous one but is limited to λ ≥ 1.

Mean number of bit comparisons for keys drawn from an arbitrary density f
In this section we outline martingale arguments for proving Theorem 1.3 for the expected number of bit comparisons for Poisson(λ) draws from a rather general density f .(For background on martingales, see any standard measure-theoretic probability text, e.g., [2].)In addition to the notation above, we will use the following: Note for each k ≥ 0 that j p k,j = 1 and that f k : (0, 1) → [0, ∞) is the smoothing of f to the rank-k rational intervals.From basic martingale theory we have immediately the following simple but key observation.
and f k → f almost surely (and in L 1 ).
Our proof of Theorem 1.3 will also utilize the following technical lemma.Lemma 6.2.If (as assumed in Theorem 1.3) the probability density f on (0, 1) satisfies Proof.This follows readily by applying one of the standard maximal inequalities for nonnegative submartingales which asserts that for a nonnegative submartingale (Y k ) 1≤k<∞ and Y * := sup 1≤k<∞ Y k we have see, e.g., [12,Theorem 10.9.4].The process (Y k := f k (ln + f k ) 3 ) 1≤k<∞ is a submartingale by Lemma 6.1 and the convexity of the function x → x(ln + x) 3 , and for every 1 ≤ k < ∞ we have 2) does indeed give the desired conclusion.
Before we begin the proof of Theorem 1.3 we remark that the asymptotic inequality µ f (λ) ≥ µ unif (λ) observed there in fact holds for every 0 < λ < ∞.Indeed, (6.3) where the first equality appropriately generalizes (5.9), the inequality follows by the convexity of E K(λ) (recall Lemma 5.1), and the second equality follows by (5.9).Furthermore, strict inequality µ f (λ) > µ unif (λ) holds unless p k,j = 2 −k for all k and j, i.e., unless the distribution F is uniform.(This argument is valid also if F does not have a density.) Proof of Theorem 1.3.Assume λ ≥ 1 and, with m ≡ m(λ) := ⌈lg λ⌉, split the double sum in (6.3) as (6.4) with R(λ) a remainder term.Our first aim is to show that Since E K(•) is nondecreasing, we have the inequality Now if λp k,j ≥ 2 n , then for x ∈ I k,j we have and therefore From this we conclude Therefore, for another constant b we have Using Lemma 6.2 we finally conclude Plugging R(λ) = O(λ) and the consequence which holds uniformly in 0 ≤ x < ∞, of Lemma 5.1 into (6.4),we find where we have used the Cauchy-Schwarz inequality at the second equality and comparison with the uniform case (f ≡ 1) at the third.
But, by Lemma 6.1, (6.1), and the dominated convergence theorem, (6.5) by the expression We use the order-statistics notation X (1) , . . ., X (n) from Section 3. To derive (7.1), we will compute the (random) total savings for all comparisons with X (i) as pivot, sum over i = 1, . . ., n, and take the expectation.For convenience, we may assume that the algorithm chooses a pivot also in the case of a (sub)array with exactly 1 element, although it is not compared to anything; thus every key becomes a pivot.Observe that X (i) is compared as pivot with keys X (L) , . . ., X (R) (except itself) and with no others, where L ≡ L(i) and R ≡ R(i) with L ≤ i ≤ R are the (random) values uniquely determined by the condition that X (i) is the first pivot chosen from among X (L) , . . ., X (R) but not (if L = 1) the first from among X (L−1) , . . ., X (R) nor (if R = n) the first from among X (L) , . . ., X (R+1) .Hence, X (i) is compared as a pivot with R − L other keys.The comparisons with X (i) as pivot are performed with the knowledge that all the keys X (L) , . . ., X (R) have values in the interval (X (L−1) , X (R+1) ), where if L = 1 we interpret X (0) as 0 = .000 . . .and if R = n we interpret X (n+1) as 1 = .111 . ... The total savings gained by this knowledge is j∈ where we recall that b(x, y) denotes the index of the first bit at which x and y differ.
Therefore the grand total savings is and so by independence we have The second expectation on the right is easily computed: where, abbreviating r − l to d and writing "xor" for "exclusive or", where: at the second equality we have used symmetry and the observation that b(0, 1) = 1; the last two sums are denoted D n and E n , respectively; and The expectation E b(X (i) , X (j) ) may be computed (for 1 ≤ i < j ≤ n) by recalling the joint density g n,i,j of (X (i) , X (j) ) given at (3.2).We then find E b(X (i) , X (j) ) = Proof.The identity for γ 0 is the definition of the function Γ, and the identities for γ 1 and γ 2 follow by integration by parts.Since γ 1 (z) is continuous in z for Re z > −2, it follows from the identity for γ 1 (z) by passage to the limit that γ 1 (−1) = Γ ′ (1) = −γ.Finally, we obtain the desired value of γ 2 (−2) simply by plugging z = −2 into the identity for γ 2 (z).

Date:
February 10, 2012.Research of the first author supported by NSF grants DMS-0104167 and DMS-0406104 and by the Acheson J. Duncan Fund for the Advancement of Research in Statistics.
r denotes the nth harmonic number of order r and H n := H(1)

n 1 . 4 . 1 .
and Stirling's formula; we omit the details.This completes the proof of Theorem 1.Remark We can calculate E K n in the same fashion (and somewhat more easily), by replacing the bit-index function b by the constant function 1.Following

Lemma 5 . 1 .
In the setting of Theorem 1.3 with F uniform, the expected number of key comparisons is a strictly convex function of λ given by E K(λ) = 2 λ 0 (λ − y)(e −y − 1 + y)y −2 dy.
with ν(x, k) := ⌊lg f * (x)⌋ − k.We proceed to bound the sum on n here.If ν ≤ 0, then using the bound of (constant times λ 2 ) on E K(λ) from Lemma 5.1 we can bound the sum n≤ν 2 −n E K(2 n+1 ) by a constant (say, b ′ ) times 2 ν , while if ν > 0 we can again use the estimates from Lemma 5.1 to bound, for some constants b 1 , b 2 , b ′′ the same sum by