Emergence of giant cycles and slowdown transition in random transpositions and $k$-cycles

Consider the random walk on the permutation group obtained when the step distribution is uniform on a given conjugacy class. It is shown that there is a critical time at which two phase transitions occur simultaneously. On the one hand, the random walk slows down abruptly (i.e., the acceleration drops from 0 to -\infty at this time as n tends to \infty). On the other hand, the largest cycle size changes from microscopic to giant. The proof of this last result is both considerably simpler and more general than in a previous result of Oded Schramm (2005) for random transpositions. It turns out that in the case of random k-cycles, this critical time is proportional to 1/[k(k-1)], whereas the mixing time is known to be proportional to 1/k.


Basic result
Let n ≥ 1 and let S n be the group of permutations of {1, . . . , n}. Consider the random walk on S n obtained by performing random transpositions in continuous time, at rate 1. That is, let τ 1 , . . . be a sequence of i.i.d. uniformly chosen transpositions among the n(n − 1)/2 possible transpositions of the set V = {1, . . . , n}, and for all t ≥ 0, set where (N t , t ≥ 0) is an independent Poisson process with rate 1. It is well-known that the permutation σ t is approximately a uniform random permutation (in the sense of total variation distance) after time (1/2)n log n (see [10]). In particular, this means that at this time, most points belong to cycles which are of macroscopic size O(n), while initially, in the permutation σ 0 which is the identity permutation, every cycle is microscopic (being of size 1). How long does it take for macroscopic cycles to emerge? Oded Schramm, in a remarkable paper [21], proved that the first giant cycles appear at time n/2. More precisely, answering a conjecture of David Aldous stated in [3], he was able to prove that if t = cn with c > 1/2, then there exists a (random) set W ⊂ {1, . . . , n} satisfying σ t (W ) = W , such that |W | ∼ θn where 0 < θ = θ(c) < 1, and furthermore, the cycle lengths of σ t | W , rescaled by θn, converges in the sense of finite-dimensional distributions towards a Poisson-Dirichlet random variable. (The Poisson-Dirichlet distribution describes the limiting cycle distribution of a uniform random permutation and will be described in more details below). In particular, this implies that σ t contains giant cycles with high probability. On the other hand it is furthermore easy to see that no macroscopic cycle can occur if c < 1/2. His proof is separated into two main steps. The first step consists in showing that giant cycles do emerge prior to time cn when c > 1/2. The second step is a beautiful coupling argument which shows that once giant cycles exist they must quickly come close to equilibrium, thereby proving Aldous' conjecture. Of these two steps, the first is arguably the most technically involved. Our main purpose in this paper is to give an elementary and transparent new proof of this fact. Let Λ(t) denote the size of the largest cycle of σ t . For δ > 0, define τ δ = inf{t ≥ 0 : Λ(t) > δn}. (1) Theorem 1. For any c > 1/2 then τ δ < cn with high probability, where This proof is completely elementary and in particular requires almost no estimate. As a consequence, it is fairly robust and it can be hoped that it extends to further models. We illustrate this by applying it to more general random walks on S n , whose step distribution is uniform on a given conjugacy class of the permutation group (definitions will be recalled below). We show that the emergence of giant cycles coincides with a phase transition in the speed of the random walk, as measured by the derivative of the distance (with respect to the graph metric) between the position of the random walk at time t, and its starting point. This phase transition in the speed is the analogue of the phase transition described in [3] for random transpositions.
We mention that Theorem 1 is the mean-field analogue of a question arising in statistical mechanics in the study of Bose condensation and the quantum ferromagnetic Heisenberg model (see Tòth [22]). Very few rigorous results are known about this model on graphs with non-trivial geometry, with the exception of the work of Angel [1] for the case of a d-regular tree with d sufficiently large. We believe that the proof of Theorem 1 proposed here opens up the challenging possibility to prove analogous results on graphs that are "sufficiently high-dimensional" such as a high-dimensional hypercube, for which the percolation picture has recently started to emerge: see, e.g., Borgs et al. [8].

Random walks based on conjugacy classes.
Fix a number k ≥ 2, and call an element γ ∈ S n a k-cycle, or a cyclic permutation of length k, if there exist pairwise distinct elements x 1 , . . . , x k ∈ {1, . . . , n} such that γ(x) = x i+1 if x = x i (where 1 ≤ i ≤ k and x k+1 := x 1 ) and γ(x) = x otherwise. Thus for k = 2, a 2-cycle is simply a transposition. If σ is a permutation then σ can be decomposed into a product of cyclic permutations σ = γ 1 · . . . · γ r where · stands for the composition of permutations. (This decomposition being unique up to the order of the terms). A conjugacy class Γ ⊂ S n is any set that is invariant by conjugacy σ → π −1 σπ, for all π ∈ S n . It easily seen that a conjugacy class of S n is exactly a set of permutations having a given cycle structure, say (k 2 , . . . , k J ), i.e., consisting of k 2 cycles of size 2, . . ., k J cycles of size J in their cycle decomposition (and a number of fixed points which does not need to be explicitly stated). Note that if Γ is a fixed conjugacy class of S n , and m > n, Γ can also be considered a conjugacy class of S m by simply adding m − n fixed points to any permutation σ ∈ Γ.
Let Γ be a fixed conjugacy class, and consider the random walk in continuous time on S n where the step distribution is uniform on Γ. That is, let (γ i , i ≥ 1) be an i.i.d. sequence of elements uniformly distributed on Γ, and let (N t , t ≥ 0) be an independent rate 1 Poisson process. Define a random process: where · stands for the composition of two permutations. Thus the case where Γ consists only of transpositions (i.e. k 2 = 1 and k j = 0 if j ≥ 2) corresponds to the familiar random process on S n obtained by performing random transpositions in continuous time, and the case where Γ contains only one nontrivial cycle of size k ≥ 2 will be referred to as the random k-cycles random walk. The process (σ t , t ≥ 0) may conveniently be viewed as a random walk on G n , the Cayley graph of S n generated by Γ. Note that if |Γ| = J j=2 jk j is even, the graph G n is connected but it is not when |Γ| is odd: indeed, in that case, the product of random p-cycles must be an even permutation, and thus σ t is then a random walk on the alternate group A n of even permutations. This fact will be of no relevance in what follows.
In this paper we study the pre-equilibrium behaviour of such a random walk. Our main result in this paper for this process is that there is a phase transition which occurs at time t c n, where ( This transition concerns two distinct features of the walk. On the one hand, giant cycles emerge at time t c n precisely, as in Theorem 1. On the other hand, the speed of the walk changes dramatically at this time, dropping below 1 in a non-differentiable way. We start with the emergence of giant cycles, which is analogue to Theorem 1. Recall the definition of τ δ in (1).
Theorem 2. Let t < t c . Then there exists β > 0 such that no cycle is greater than β log n with high probability. On the other hand for any t > t c there exists δ > 0 such that τ δ < tn with high probability.
We now state our result for the speed. Denote by d(x, y) the graph distance between two vertices x, y ∈ S n , and for t ≥ 0, let where o is the identity permutation of S n . Recall that a sequence of random functions X n (t) converge uniformly on compact sets of S ⊂ R in probability (u.c.p. for short) towards a random function X(t) if P(sup t∈S,t≤T |X n (t) − X(t)| > ε) → 0 as n → ∞ for all ε > 0 and T > 0.
Theorem 3. Fix a constant integer J ≥ 2 and constant nonnegative integers k 2 , . . . , k J , and consider the conjugacy class Γ of S n defined by (k 2 , . . . , k J ). Let t c be as in (3), and fix t > 0. Then there exists a compact interval I ⊂ (t c , ∞), and a nonrandom function ϕ(t) satisfying ϕ(t) = t for t ≤ t c and ϕ(t) < t for t > t c , such that uniformly on compact sets in probability as n → ∞. Furthermore ϕ is C ∞ everywhere except at t = t c , where the acceleration satisfies u ′′ (t + c ) = −∞. In the case of random k-cycles (k ≥ 2), I = ∅ so the convergence holds uniformly on compact sets in R.
Remark 4. We believe that I = ∅ in all cases, but our proof only guarantees this in the case of random k-cycles and a few other cases which we have not tried to describe precisely. Roughly speaking there is a combinatorial problem which arises when we try to estimate the distance to the identity in the case of conjugacy classes which contain several non-trivial cycles of distinct sizes (particularly when these are coprime). This is explained in more details in the course of the proof. Right now, the current result is enough to prove that there is a phase transition for d(tn) when t = t c , but does not prevent other phase transitions after that time.
In the case of random k-cycles, we have t c = 1/[k(k−1)] and the function ϕ has the following explicit expression: It is a remarkable fact that for t ≤ t c a cancellation takes place and ϕ(t) = t. The case k = 2 of random transpositions matches Theorem 4 from [3].
In the general conjugacy class case, ϕ may be described as the solution to a certain differential equation. For t ≥ 0 and z ∈ [0, 1], let G t (z) = exp(−|Γ|t + t J j=2 jk j z j−1 ), and let ρ = ρ(t) be the smallest solution of the equation (in z): G t (z) = z. Then ϕ is defined by It is a fact that θ(t) > 0 if and only if t > t c , which explains why ϕ(t) = t for t ≤ t c and ϕ(t) < t for t > t c .

Heuristics
The k-cycle random walk is a simple generalization of the random transpositions random walk on S n , for which the phase transition in Theorem 3 was proved in [3]. Observe that any k-cycle (x 1 , . . . , x k ) may always be written as the product of k − 1 transpositions: This suggests that, qualitatively speaking, the k-cycle random walk should behave as "random transpositions speed up by a factor of (k − 1)", and thus one might expect that phase transitions occur at a time that is inversely proportional to k. This is for instance what happens with the mixing time for the total variation distance. (This was recently proved in [5] and was already known for k ≤ 6, the particular case k = 2 being the celebrated Diaconis-Shahshahani theorem [10]); see [16] and [9] for an excellent introduction to the general theory of mixing times, and [20] in particular for mixing times of random walks on groups). It may therefore come as a surprise that t c = 1/[k(k−1)] rather than t c = 1/k. As it emerges from the proof, the reason for this fact is as follows. We introduce a coupling of (σ t , t ≥ 0) with a random hypergraph process (H t , t ≥ 0) on V = {1, . . . , n}, which is the analogue of the coupling between random transpositions and Erdős-Renyi random graphs introduced in [3]. As we will see in more details, hypergraphs are graphs where edges (or rather hyperedges) may connect several vertices at the same time. In this coupling, every time a cycle (x 1 , . . . , x k ) is performed in the random walk, H t gains a hyperedge connecting x 1 , . . . , x k . This is essentially the same as adding the complete graph K k on {x 1 , . . . , x k } in the graph H t . Thus the degree of a typical vertex grows at a speed which is k(k−1)/2 faster than in the standard Erdős-Renyi random graph. This results in a giant component occurring k(k − 1)/2 faster as well. This explains the formula t −1 c = k(k − 1), and an easy generalisation leads to (3). Organisation of the paper: The rest of the paper is organised as follows. We first give the proof of Theorem 1. In the following section we introduce the coupling between (σ t , t ≥ 0) and the random hypergraph process (H t , t ≥ 0). In cases where the conjugacy class is particularly simple (e.g. random k-cycles), a combinatorial treatment analogous to the classical analysis of the Erdős-Renyi random graph is possible, leading to exact formulae. In cases where the conjugacy class is arbitrary, our method is more probabilistic in nature and the formulae take a different form (H t is then closer to the Molly and Reed model of random graphs with prescribed degree distribution, [17] and [18]). The proof is thus slightly different in these two cases (respectively dealt with in Section 3 and 4), even though conceptually there are no major differences between the two cases.

Emergence of giant cycles in random transpositions
In this section we give a full proof of Theorem 1. As the reader will observe, the proof is really elementary and is based on well-known (and easy) results on random graphs. Consider the random graph process (G t , t ≥ 0) on V = {1, . . . , n} obtained by putting an edge between i and j if the transposition (i, j) has occurred prior to time t. Then every edge is independent and has probability p t = 1−e −t/( n 2 ) , so G t is a realisation of the Erdős-Renyi random graph G(n, p t ).
For t ≥ 0 and i ∈ V , let C i denote the cycle that contains i. Recall that if C i = C j then a transposition (i, j) yields a fragmentation of C i = C j into two cycles, while if C i = C j then the transposition (i, j) yields a coagulation of C i and C j . It follows from this observation that every cycle of σ t is a subset of one of the connected components of G t . Thus let N (t) be the number of cycles of σ t and letN (t) denote the number of components of G t . Then we obtain Now it is a classical and easy fact that the numberN (t) has a phase transition at time n/2 (corresponding to the emergence of a giant component at this time). More precisely, let θ(c) be the asymptotic fraction of vertices in the giant component at time cn, so θ(c) is the survival probability of a Poisson Galton-Watson process with mean offspring 2c (in particular θ(c) = 0 if c < 1/2). Let c > 1/2 and fix an interval of time [t 1 , t 2 ] such that t 2 = cn and t 1 = t 2 − n 3/4 . Our goal will be to prove that a cycle of size δn occurs during the interval I = [t 1 , t 2 ], where δ = θ(c) 2 /8.

Lemma 5.
As n → ∞, in the sense that the ratio of these two quantities tends to 1 in probability.
Proof. This lemma follows easily from the following observation. The total number of edges that are added during I is a Poisson random variable with mean t 2 − t 1 . Now, each time an an edge is added to G t , this changes the number of components by -1 if and only if the two endpoints are in distinct components (otherwise the change is 0). Since the second largest component has size smaller than β log n with high probability, except on an event of probability tending to 0, throughout [t 1 , t 2 ] this occurs if and only if both endpoints are not in the giant component, which has probability uniformly close to 1 − θ 2 (c). The law of large numbers concludes the proof.
Proof. We already know that N (t) ≥N (t) for all t ≥ 0. It thus suffices to control that the excess number of cycles is never more than 4n 1/2 in expectation. Note first that there can never be more than n 1/2 cycles of size greater than n 1/2 . Thus it suffices to count the number N ex ↓ (t) of excess cycles of size ≤ n 1/2 : These excess cycles of size ≤ n 1/2 at time t must have been generated by a fragmentation at some time s ≤ t where one of the two pieces was smaller than n 1/2 . But at each step, the probability of making such a fragmentation is smaller than 2n −1/2 . Indeed, given the position of the first marker i, there are at most 2n 1/2 possible choices for j which result in a fragmentation of size smaller than n 1/2 . To see this, note that if a transposition (i, j) is applied to a permutation σ, and C i = C j , so σ k (i) = j, then the two pieces are precisely given by (σ 0 (i), . . . , σ k−1 (i)) and (σ 0 (j), . . . , σ |C|−k−1 (j)). Thus to obtain a piece of size k there are at most two possible choices, which are is the total number of fragmentation events where one of the pieces is smaller than n 1/2 by time cn. Since sup this finishes the proof.
Proof of Theorem 2. Appying Markov's inequality in Lemma 6, we see that since n 1/2 ≪ n 3/4 = t 2 − t 1 , we also have in probability, by Lemma 5. On the other hand, N (t) changes by -1 in the case of a coalescence and by +1 in the case of a fragmentations. Hence where F (I) is the total number of fragmentations during the interval I. We therefore obtain by the law of large numbers for Poisson random variables: But observe that to if F (I) is large, it cannot be the case that all cycles are small -otherwise we would very rarely pick i and j in the same cycle. Hence consider the event E = {τ δ < cn}. On E ∁ , the maximal cycle size throughout I is no more than δn. Hence at each transposition, the probability of making a fragmentation is no more than δ. By the law of large numbers, on the event E ∁ , it must be that F (I) ≤ 2δ(t 2 − t 1 ). Since 2δ = θ(c) 2 /4, it follows immediately that P(E ∁ ) → 0 as n → ∞. This completes the proof. 3 Random hypergraphs and Theorem 3.
We now start the proof of Theorem 3. We first review some relevant definitions and results from random hypergraphs.
A hypergraph is a graph where edges can connect several vertices at the same time. Formally: For a given d ≥ 2 and 0 < p < 1, we call G d (n, p) the probability distribution on d-regular hypergraphs on V = {1, . . . , n} where each hyperedge on d vertices is present independently of the other hyperedges with probability p. Observe that when d = 2 this is just the usual Erdős-Renyi random graph case, since a hyperedge connecting two vertices is nothing else than a usual edge. For basic facts on Erdős-Renyi random graphs, see e.g. [7].
The notion of a hypertree needs to be carefully formulated in what follows. We start with the d-regular case. The excess ex(H) of a given d-regular hypergraph H is defined to be where r = |H| and h is the number of edges in H.
Observe that if H is connected then ex(H) ≥ −1. Likewise if ex(H) = 0 and H is connected we will say that H is unicyclic and if the excess is positive we will say that the component is complex.
Remark 8. This is the definition used by Karoński and Luczak in [15], but differs from the definition in their older paper [13] where a hypertree is a connected hypergraph such that removing any hyperedge would make it disconnected.
In the case where H is not necessarily regular, the excess of a connected hypergraph H made up of the hyperedges h 1 , . . . , h n is defined to be ex(H) =

Critical point for random hypergraphs
We start by recalling a theorem by Karoński and Luczak [15] concerning the emergence of a giant connected component in a random hypergraph process (H t , t ≥ 0) where random hyperedges of degree d ≥ 2 are added at rate 1. Note that if c < c d the number of unicyclic components is no more than C ′ log n for some C ′ > 0 which depends on c. Indeed, at each step the probability of creating a cycle is bounded above by C log n/n since the largest component is no more than O(log n) prior to time cn. Since there are O(n) steps this proves the claim. We will need a result about the evolution of the number of componentsN (t) in (H t , t ≥ 0). Proposition 10. Let t > 0. Then as n → ∞, Proof. Note first that, by monotonicity of the number of clusters and continuity of the function in the right-hand side, it suffices to establish this result when t = 1/[d(d − 1)]. Moreover, by Theorem 9 and since there are no more than C log n unicyclic components it is enough to count the number of hypertreesÑ (s) smaller than C log n in H s where s = tn. We will first compute the expected value and then prove a law of large numbers using a second moment method.
Let h ≥ 0, we first compute the number of hypertrees with h hyperedges (h = 0 corresponds to isolated vertices). These have r = (d−1)h+1 vertices. By Lemma 1 in Karoński-Luczak [14], there are trees on r = (d − 1)h + 1 labeled vertices (this is the analogue to Cayley's (1889) well-known formula that there are k k−2 ways to draw a tree on k labeled vertices). If T is a given hypertree with h edges labelled by elements of V = {1, . . . , n}, there are a certain number of conditions that must be fulfilled in order for T to be one of the components of G: (i) The h hyperedges of T must be open, (ii) r d − s hyperedges must be closed inside the rest of T , (iii) T must be disconnected from the rest of the graph, which requires closing r n−r d−1 hyperedges. Now, remark that at time s = tn, because the individual Poisson clocks are independent, each hyperedge is present independently of the others with probability p = 1 − exp −s/ n d ∼ d!t/n d−1 . It follows that the probability that T is one of the components of H t is Hence the expected number of trees in H s with h edges is Write C for the set of connected components of H t . Note that if T 1 and T 2 are two given hypertrees on V with distinct vertex sets and with h hyperedges each, then From this we deduce that cov(1 {T 1 ∈C} , 1 {T 2 ∈C} ) → 0 and that var(Ñ h (s)) = o(n 2 ). Thus, by Chebyshev's inequality: in probability as n → ∞. The end of the proof of the proposition now follows from (13) and the following bound: whereÑ >h 0 (s) = C log n h=h 0 +1Ñ h (s). Indeed, for every ε > 0 and η > 0, we can choose h 0 large enough such that the finite sum in the right-hand side of (13) lies within ε of the infinite series. We then choose n large enough so that E(Ñ >h 0 (s))/n ≤ εη, whence by Markov's inequality: We now conclude using (13). To obtain the bound (14) we use (12), from which it follows (using n r ≤ n r /r! and 1 − e −x ≤ x), But since r = (d − 1)h + 1 ≤ C log n, we see that where the term o(1) is uniform in r ≤ C log n. Using Stirling's formula we obtain a uniform exponential bound for E(Ñ h (s)/n) provided that t = 1/[d(d − 1)]. (14) now follows.

Bounds for the Cayley distance on the symmetric group
In the case of random transpositions we had the convenient formula that if σ ∈ S n then d(o, σ) = n−#cycles, a formula originally due to Cayley. In the case of random k-cycles with k ≥ 3, unfortunately there is to our knowledge no exact formula to work with. However this formula stays approximately true, as shown by the following proposition.
Proposition 11. Let k ≥ 3 and let σ ∈ S n . (If k is odd, assume further that σ ∈ A n ). Then where |σ| is the number of cycles of σ, C(k) is a universal constant depending only on k, and R k (σ) is the set of cycles of σ whose length ℓ = 1 mod k − 1.
Proof. For simplicity we consider only the case k = 3. Thus let σ ∈ A n . For each cycle of odd length (i 1 , . . . , i 2r+1 ) we can write which has exactly r 3-cycles factors. Now, because σ ∈ A n , the number of cycles of even length must be even. So let (i 1 , . . . , i 2r )(j 1 , . . . , j 2m ) be a pair of even cycles. Then we start by building in two moves and then completing each of the cycle in the same way as above. The total number of moves to build this pair of cycles is thus 2 + (r − 1) + (m − 1) = r + m. It follows that σ can be made up of at most This gives the upper-bound. On the other hand, multiplying σ by a 3cycle can create at most two new cycles. Hence, after p multiplications the resulting permutation cannot have more than |σ| + 2p cycles. Therefore the distance must be at least that k 0 for which |σ| + 2k 0 ≥ n, since the identity permutation has exactly n cycles. The lower-bound follows.

Phase transition for the 3-cycle random walk
We now finish the proof of Theorem 3 in the case of random k-cycles.
Proof of Theorem 3 if k j = δ k,j . The proof follows the lines of Lemma 6. Let N (t) be the number of cycles of σ and letN (t) be the number of components in H t , where (H t , t ≥ 0) is the random k-regular hypergraph process obtained by adding the edge {x 1 , . . . , x k } whenever the k-cycle (x 1 , . . . , x k ) is performed. Then note again that every cycle of σ t is a subset of a connected component of H t , so N (t) ≥N (t). (Indeed, this property is a deterministic statement for transpositions, and a sequence of random k-cycles can be decomposed as a sequence (k − 1) times as long of transpositions.). Repeating the argument in Lemma 6, we see that in probability. This is proved in greater generality (i.e., for arbitrary conjugacy classes) in Lemma 14. Moreover, for any c ∈ R k (σ t ) must have been generated by fragmentation at some point (otherwise the length of cycles only increases by k − 1 each time). Thus R k (σ t ) ≤ N (t) −N (t), and Theorem 3 now follows.

Proofs for general conjugacy classes 4.1 Random graph estimates
Let Γ = (k 2 , . . . , k J ) be our fixed conjugacy class. A first step in the proof of Theorems 3 and 2 in this general case is again to associate a certain random graph model to the random walk. As usual, we put a hyperedge connecting x 1 , . . . , x k every time a cycle (x 1 . . . x k ) is performed as part of a step of the random walk. Let H s be the random graph on n vertices that is obtained at time s. A first step will to prove properties of this random graph H s when s = tn for some constant t > 0. Recall our definition of t c : and that 1 − θ be the smallest solution of the equation (in z): Lemma 12. If t < t c then there exists β > 0 such that all clusters of H tn are smaller than β log n with high probability. If t > t c , then there exists β > 0 such that all but one clusters are smaller than β log n and the largest cluster L n (t) satisfies in probability.
Proof. We first consider a particular vertex, say v ∈ V , and ask what is its degree distribution in H tn . Write σ t = γ 1 . . . γ Nt where (γ i , i ≥ 1) is a sequence of i.i.d. permutations uniformly distributed on Γ, and (N t , t ≥ 0) is an independent Poisson process. Note that for t ≥ 0, #{n ≤ N t : v ∈ Supp(γ i )} is a Poisson random variable with mean t J j=2 jk j /n. Thus by time tn, the number of times v has been touched by one of the γ i is a Poisson random variable with mean t J j=2 jk j . For each such γ i , the probability that v was involved in a cycle of size exactly ℓ is precisely ℓk ℓ / J j=2 jk j . Thus, the number of hyperedges of size j that contain v in H tn is P j , where (P j , j = 2, . . . , J) are independent Poisson random variables with parameter tjk j . Since each hyperedge of size j corresponds to j − 1 vertices, we see that the degree of v in in H tn , D v , has a distribution given by Now, note that by definition of t c (see (16)), The proof of Theorem 3.2.2 in Durrett [11] may be adapted almost verbatim to show that there is a giant component if and only if E(D v ) > 1, and that the fraction of vertices in the giant component is the survival probability of the associated branching process. Note that the generating function associated with the progeny (18) is is the smallest root of the equation G t (z) = z. From the same result one also gets that the second largest cluster is of size no more than β log n with high probability, for some β > 0.
LetN (s) be the number of clusters at time s in H s , and let u n (t) = 1 n E (N (tn)). Define a function u(t) by putting: where K := J j=2 k j (j − 1), and note that that u(0) = 1, for t < t c we have u(t) = 1 − Kt, and u(t) > 1 − Kt for t > t c . f (x 1 (sn), . . . , x n (sn))ds (20) is a martingale, if (x 1 (s), . . . , x n (sn)) denote the ordered normalized cluster sizes of H(s). (Note that M 0 = 1.) Thus, taking expectations, We claim that, as n → ∞, for every s fixed, where K = J j=2 k j (j − 1). To see this, note that for every hyperedge h = {i 1 , . . . , i j } of size 2 ≤ j ≤ J which is added to the graph, the increase in the number of clusters is the same as if we successively add the edges As n → ∞, by Lemma 12, this converges to −1 + θ(s) 2 . (Note in particular that this limit is independent from what happened during the earlier edges added to H). Since there are (j − 1) edges to add for a hyperedge of size j and k j such hyperedges, (21) follows. Using the Lebesgue convergence theorem, we deduce that, To obtain convergence in the u.c.p. sense (uniform on compacts in probability), we note that var(|H ′ | − |H|) ≤ C for some constant C which depends only on (k 2 , . . . , k ℓ ), since |H ′ | may differ from |H| only by a bounded amount. Now, by Doob's inequality, if M s = n(M s − 1): The last line inequality is obtained by conditioning on the number of steps N between times 0 and tn, noting that after each step, the variance ofM t increases by at most C by (22). Hence, to conclude the proof of Lemma 13, it suffices to show that we have the convergence: This is a direct consequence of the fact that as n → ∞: which itself follows from pointwise convergence in probability, monotonicity in s, and the fact that the limiting function is continuous. (Monotonicity comes from a simple coupling argument, using the fact that H(t) is a purely coalescing process). Lemma 14. Let N (t) be the number of cycles of σ(tn). Then we have, as n → ∞:

Random walk estimates
Proof. This is very similar to Lemma 6. Say that a cycle is large or small, depending on whether it is bigger or smaller than √ n. To start with, observe that there can never be more than √ n large cycles. As usual, we have that N (t) ≥N (t), and we let N ex (t) = N (t) −N (t) be the excess number of cycles. This in turn can be decomposed as N ex (t) = N ex where the subscripts ↑ and ↓ refer to the fact that the cycles are either small or large. Thus we have N ex and the problem is to control N ex ↓ (t). Writing every cycle of size j as a product of j − 1 transpositions, we may thus write σ t = mt i=1 τ i , for a sequence of transpositions having a certain distribution (they are not independent). Then N ex where F ↓ (t) is the number of times 1 ≤ i ≤ m that the transpositions τ i yields a fragmentation event for which one of the fragments is small. However, conditionally on τ 1 , . . . , τ i−1 , the conditional probability that τ i yields such a fragmentation is still bounded by 4n −1/2 . Since m t = KN t , where K = J j=2 (j − 1)k j ≥ 1 and N t is a Poisson random variable with mean t, it follows that Thus by Markov's inequality, Hence, n −3/4 |N (tn)−N (tn)| converges to 0 u.c.p, which concludes the proof by Lemma 14.
Note in particular that by combining Lemma 13 with Lemma 14, we get that 1 n N (tn) → u(t), u.c.p.
Proof. The idea is to say that, since we know that the number of cycles is approximately the number of clusters in the random graphs, this implies a nonlinearity in the behaviour of this number. In turns, this means there are many fragmentations and thus that there are some large clusters.
To formalize this, assume that a permutation σ has a cycle structure (C 1 , . . . , C r ) and that x 1 , . . . , x r are the normalized cycle sizes, i.e., x i = |C i |/n. Define a function g(x 1 , . . . , x r ) by putting where σ ′ = σ ·γ and γ is a uniform random element from Γ, while |σ| denotes the number of cycles of σ. Then if we define a process then (M ′ t , t ≥ 0) is a martingale started from M ′ 0 = 1. Moreover, writing τ = τ 1 · . . . · τ K , where τ i are transpositions, and if we let σ i = σ · τ 1 . . . τ i , so that σ 0 = σ and σ K = σ ′ , then Recall that the transposition τ i can only cause a coalescence or a fragmentation, in which case the number of cycles decreases or increases by 1. If the relative cycle sizes of σ i−1 are given by (y 1 , . . . , y r ), it follows that where y * i = max(y 1 , . . . , y r ). Moreover, y * i ≤ 2 i y * 0 . From this we obtain directly that with high probability (uniformly on compact sets) where x * (s) = max(x 1 (s), . . . , x r (s)). On the other hand, using Doob's inequality in the same way as (23), we also have: Combining this information with (27), we obtain, with high probability uniformly on compact sets: From this we get, since r i=1 x i (sn) 2 ≤ Λ n (t), with high probability i.e., τ δ ≤ tn.

Distance estimates
We are now ready to prove that uniformly on compact sets in probability as n → ∞ except possibly on some interval compact I in (t c , ∞), where The proof is analogous but more complicated than that of Proposition 11. Note that if σ is a permutation, every transposition can at most increase the number of cycles by 1. Hence if σ has N (σ) cycles, after one step s ∈ Γ, σ has at most N (σ) + K cycles. Thus after p steps, the number of cycles of σ is at most N (σ) + Kp. Since the identity permutation has exactly n cycles, we conclude that Together with Lemma 14 and the definition of ϕ(t), this proves the lower bound in Theorem 3. Note that this bound would be sharp if we can find a path to the identity which makes a fragmentation at each step. We now work our way towards the upper-bound, which shows that indeed such a path may be found except that we may have to add an additional o(n) coagulation steps. Call a component of H t good if it is a hypertree and bad otherwise; a hyperedge is good if its component is good. Likewise, call a cycle C of σ(t) good if its associated componentC in H t is a hypertree. Therefore, a good cycle is one which has never been involved in fragmentations, i.e., its history consists only of coagulation events. Fix t > 0 and write σ(tn) = σ g · σ b , where σ g is the product of all good cycles of σ(tn) while σ b is the product of all bad cycles. Thus Note that by (26), and recalling that there can never be more than √ n cycles greater or equal to √ n, we have r(b) ≤ n 3/4 say, and the total mass of cycles where o(1) stands for a term that converges to 0 in probability, u.c.p. Assume for simplicity that Γ is an odd conjugacy class that generates all of S n (the arguments below can easily be adapted otherwise). To start with, note that in less than o(n) moves, we can transform σ(tn) into σ ′ where all the cycles c b 1 , . . . , c b r(b) have been coagulated to form one large bad cycle, leaving the good cycles unchanged. Thus σ ′ = σ g · σ ′b , where σ ′b has only one nontrivial cycle, whose size is |σ b |. By the triangle inequality, it then suffices to find a path between σ ′ and the identity of length approximately given by (34).
Roughly speaking, our strategy for constructing a path between σ ′ and the identity using steps from the conjugacy class Γ is to systematically destroy every good cycles as much as possible before destroying the bad cycles, as the good cycles are slightly harder to destroy than the bad cycles. Indeed, consider a cycle C such that |C| > |Γ|. Then note that applying a judicious permutation s ∈ Γ to C we can transform C into C ′ where the elements of C \ C ′ are now fixed points, and |C ′ | = |C| − K. Therefore, for an arbitrary cycle C, we get that C can be destroyed in at most where the term O(1) is nonrandom, uniformly bounded in C and n. This bound is useful for the large bad cycle that makes up σ ′ , but does not help for small (good) cycles, of which there are of order n. However, if C is a good cycle and e 1 , . . . , e j are the hyperedges associated with the component of C in G(tn) (corresponding to the application of certain cycles as part of a step prior to time tn, say γ 1 , . . . , γ j , which we will call the subcycles of C), then C can be destroyed by applying successively j random cycles γ ′ 1 , . . . , γ ′ j of respective length |e 1 |, . . . , |e j |, in some specified order. Unfortunately, it may not always be possible to perform exactly the sequence γ ′ 1 , . . . , γ ′ j as there are some arithmetic constraints on the sizes of the cycles that can be performed (indeed, each application of a cycle must be a part of the application of a permutation s ∈ Γ). A problem may thus arise because, among good components the smaller hyperedges tend to be over-represented. This is made precise by the next lemma.
Lemma 17. Fix t > 0. Let j ≥ 2 such that k j > 0. Then the number U j (tn) of good hyperedges of size j in H tn , satisfies Proof. The number of j-edges that have been added to G(tn) is a Poisson random variable with mean tnk j . For each such edge, the probability that it is not in the giant component W converges to (1 − θ(t)) j . [To see this, note that by Lemma 12, it suffices to check that none of the j points are in a cluster of size greater than β log n for β > 0 large enough. This involves checking a neighbourhood of these j points so that no more than jβ log n vertices' connections are revealed. Since this is much smaller than the n 1/2 neighbourhood size of the birthday problem The probability that the exploration Thus E(U j (tn)) ∼ tnk j (1 − θ) j . while if e and e ′ are two randomly chosen j-edges, P(e ⊂ W, e ′ ⊂ W ) converges for the same reasons to (1 − θ(t)) 2j , so that cov(1 {e⊂W } , 1 {e ′ ⊂W } ) → 0. Thus the lemma follows from the second moment method.
Recall that J is the maximal size of a cycle for a permutation s ∈ Γ, so that the subcycles of size J are the most under-represented among good cycles. Consider the path that leads from σ J := σ ′ to σ J−1 in d J = U J (tn)/k J steps, where σ J−1 is the permutation obtained by destroying from σ J all the subcycles of size J from all good cycles and completing each step by removing k j subcycles of size j for 2 ≤ j ≤ J − 1 among good cycles. At this point we may write σ J−1 = σ g J−1 · σ ′b J−1 , where σ b J−1 = σ ′b (so the bad part is unchanged) and σ g J−1 is the same as σ g but all subcycles of size J have been destroyed.
If Γ consists only of k-cycles, then the estimate (36) with (35) finishes the proof of the theorem in that case. Else, we still call the cycles of σ g j−1 good, and note that they may still be decomposed in subcycles of size j ≤ J − 1. We similarly construct inductively σ J−2 , . . . , σ 1 , where σ j−1 is obtained from σ j by removing from it all good subcycles of size j. Each time a step s = c 1 . . . c L is performed, where L = J ℓ=2 k ℓ , we take c ℓ from the good subcycles of σ g j if ℓ ≤ j, while we use for c ℓ vertices from σ b j . This construction is possible so long as σ b j does not "run out of mass". However, by Lemma 17, for every ε > 0 with high probability the total mass that is required from bad cycles in this procedure is no more than