RANDOM WALKS ON TREES AND MATCHINGS

We give sharp rates of convergence for a natural Markov chain on the space of phylogenetic trees and dually for the natural random walk on the set of perfect matchings in the complete graph on 2 n vertices. Roughly, the results show that 12 n log n steps are necessary and su(cid:14)ce to achieve randomness. The proof depends on the representation theory of the symmetric group and a bijection between trees and matchings.

In this paper, we analyze a natural random walk on M n , the set of perfect matchings on 2n points, along with the isomorphic walk on trees. For matchings, a step in the walk is obtained by picking two matched pairs at random, a random entry of each pair, and transposing these entries. Thus, switching 2 and 3 moves {1 The Markov chain (1) has the uniform distribution π(x) = 2 n n!/(2n)! as unique stationary distribution. Our main result determines sharp rates of convergence to stationarity.
Theorem 1 For the Markov chain K(x, y) of (1) on M n the space of perfect matchings on 2n points, for any starting state x, if m = 1 2 n(log n + c), with c > 0, then ||K m x − π|| ≤ ae −c In (2) K m x (y) = K m (x, y) = z K m−1 (x, z)K(z, y) and a is a universal constant. The result is sharp; if m = 1 2 n(log n − c) for c positive, there is x * and positive = (c) such that (3) ||K m x * − π|| ≥ , for all n.
In (2) and (3) the norm is the total variation distance Section 2 contains background on the analytic theory of Markov chains and needed tools from representation theory. Theorem 1 is proved in Section 3 by bounding the total variation norm by the L 2 norm. This is expressed exactly in terms of symmetric group characters. Then standard calculus estimates finish the job. To conclude this introduction, we discuss background on phylogenetic trees, random matchings, diffusion problems, zonal polynomials, and the Metropolis algorithm. Each gives a different interpretation of Theorem 1.

Phylogenetic Trees
Leaf-labeled trees are a mainstay of modern genomics, depicting family trees or parental relations for l populations, species or genes. An overview, history and statistical developments is in Holmes (1999). See Page & Holmes (2000) for a good book-length treatment of trees and Aldous (1996Aldous ( , 2001) for a probabilistic view. Monte Carlo Markov chains for computing with trees and a variety of scenarios for natural probability distributions on trees lean on random walks such as (1) on the space of all trees. Aldous (2000) and Schweinsberg (2001) study a different walk for which they have given coupling and eigenvalue bounds quite different from (1).
In hope of duplicating the achievements of comparison theory in the analysis of random walk on groups (Diaconis & Saloff-Coste (1993a), Diaconis & Saloff-Coste (1993b), Aldous & Fill (2002)) we searched for a Markov chain on trees which permits a complete analysis. The present matching chain, carried over to trees, offers a candidate which we hope will prove useful.
We briefly describe the correspondence between matchings and trees. Begin with a tree with labeled leaves. Label the internal vertices sequentially with + 1, + 2, . . . , 2( − 1) choosing at each stage the ancestor which has both children labeled and who has the descendant lowest possible available label (youngest child). Thus the tree .
When all nodes are labeled, create a matching on 2n = 2( − 1) vertices by grouping siblings. In the example above, this yields To go backward, given a perfect matching of 2n points, note that at least one matched pair has both entries from {1, 2, 3 . . . , n+1}. All such labels are leaves; if there are several leaf-labeled pairs, choose the pair with the smallest label. Give the next available label (n + 2 = + 1) to their parent node. There are then a new set of available labeled pairs. Choose again the pair with the smallest label to take the next available label for its parent, and so on.  Theorem 1 holds as stated for trees: after 1 2 n(log n + c) steps this walk is close to the uniform distribution.

Random Matchings
Let G be a graph with vertex set V and edge set E. A perfect matching is a set of disjoint edges containing all vertices. Matchings have evolved as an important tool in graph algorithms. See Lovasz & Plummer (1985) or (Pulleyblank, 1995, chapter 4).
The nearest neighbor process of Theorem 1.1 is a procedure for generating a random matching. A similar procedure can be run on more general graphs. Such procedures form the basis of an interesting new stochastic algorithm for approximating the number of perfect matchings in the graph. This last is a #−P complete problem and these stochastic algorithms offer the only currently feasible approach. Following work by Jerrum & Sinclair (1989), Jerrum et al. (2000) showed that the analogous random walk on bipartite graphs is rapidly mixing. The actual bound they found, while polynomial in n (number of vertices), is probably far from the truth. Theorem 1 gives the only example where the sharp bounds are known. We note further that our analysis determines all the eigenvalues of the matching graph.

A Diffusion Problem
The original motivation for considering random matchings came from its association with the Bernoulli-Laplace diffusion model. There, one considers two urns, the left containing n red balls, the right containing n black balls. At each time, a ball is chosen at random in each urn and the two balls are switched. Diaconis & Shahshahani (1987) show that it takes 1 4 log n + cn switches to mix up the urns. An analysis for three urns is developed in Scarabotti (1997).
It is natural to consider problems with more urns. The present paper can be considered as involving n urns, each containing 2 balls. As explained below, the approach we use here works for any number of urns, provided all contain the same number of balls. However with three urns with a different number of balls in each, the problem is open.

Zonal Polynomials
Our original proof of Theorem 1 used the machinery of Gelfand pairs and the correspondence between the zonal spherical functions associated to the space of matchings with the zonal polynomial of Alan James. The symmetric group S 2n acts transitively on matchings by permuting coordinates. The subgroup fixing the matching {1, 2}{3, 4} . . . {2n − 1, 2n} is isomorphic to B n , the group of symmetries of an n-dimensional cube (|B n | = 2 n n!). The permutation representation of S 2n is multiplicity free and the eigenvalues of the random walk are the values of the spherical functions of the representation (Diaconis (1988)).
Using a form of the Schur-Weyl duality, these spherical functions may be identified with the coefficients of zonal polynomials expanded in the power sum symmetric functions.
The random walk led to new formulae for zonal polynomials and substantial new mathematics in joint work with Eric Lander. This has been brilliantly exposited by (Macdonald, 1995, Chapter 7). We recently realized that all of this machinery could be avoided. The simple proof which follows leans on early work (Diaconis & Shahshahani, 1981). It only needs basic representation theoretic tools. The new approach can be used backward to give different proofs of spherical function formulae. Using an extension of the random transposition results due to Roussel (2000) leads to new formulae for spherical functions and zonal polynomials.

A Metropolis walk on partitions; coagulation and fragmentation
Let P n be the partitions of n, e.g. P 3 = {3; 2, 1; 1, 1, 1}. Partitions are written as λ n defines a probability distribution on P n . To see this, recall that the partitions index the conjugacy classes of the representation of the symmetric group S n , and the size of the conjugacy class corresponding to λ is n!/z(λ). Since the conjugacy classes partition S n , λ n! z(λ) = n!. Repeated random transpositions induce a random walk on S n with a uniform stationary distribution. If the walk is lumped to conjugacy classes, we get a Markov chain on partitions with 1 z(λ) as its stationary distribution. When n = 3, the stationary distribution and transition matrix are: This walk on partitions is a special case of a large class of models studied by chemists and physicists as a process of coagulation and fragmentation. See the reviews by Aldous (1999Aldous ( , 1998, Durrett et al. (1999) or Mayer-Wolfe et al. (2001). It does not seem to have been noticed that the eigenvalues of this walk were given in Diaconis & Shahshahani (1981) as the number in (13) below and the eigenvectors are the characters of the symmetric group. We will call this walk on partitions the conjugacy walk.
The walk (1) on matchings also induces a walk on partitions P n . To see this, let the identity matching be {1, 2}{3, 4} . . . {2n − 1, 2n}. Given two matchings x, y ∈ M n , form a graph on 2n points, with blue edges for the matched pairs in x and red edges for the matched pairs in y. This graph decomposes into disjoint cycles, each of even length with alternating red/blue edges. These cycle lengths divided by 2 form a partition of n, which we call the distance between x and y. See Macdonald (1995) for more details. Start the random walk of (1) at the identity matching, and report only the distance from the identity at each step. Thus when n = 3, a sample walk of matchings and partitions is This distance walk on partitions has stationary distribution , with c = 2n n 2 2n To see this, note that the distance of a matching is in one-to-one correspondence with the double coset in B n \S n /B n . The double coset containing x of distance λ has size 2 (λ) z(λ) by Macdonald (1995) . The stationary distribution is the special case of Ewens' sampling formula (with θ = 1 2 ) used in population genetics (see Ewens (1972)). When n = 3 the distance walk has stationary distribution and transition matrix: We find the following relation between the conjugacy chain and the distance chain curious: if the conjugacy chain is changed to have stationary distribution π(λ) by the Metropolis algorithm (Hammersley & Handscomb, 1964), the resulting metropolized chain is the distance chain. To explain, let the conjugacy chain have transition matrix P (λ, µ). The Metropolis procedure constructs an auxiliary matrix Then, define the Metropolis chain by Our observation is that for all n, M is the distance chain. For example, when n = 3 the matrix A(λ, µ) is  (7) with (6) yields (8). This is closely related to other examples. Diaconis & Ram (2000) and Diaconis & Hanlon (1992) show how the Metropolis algorithm sometimes results in group theoretically natural deformations. We thus can see the present paper as giving a sharp analysis of an instance of the Metropolis algorithm.

Background and Needed Tools
In broad outline, the proof of Theorem 1 is similar to the analysis of random transpositions in S 2n . The expository account presented in Diaconis (1988) develops needed background from first principles and certainly provides sufficient background. Splendid treatments of the more recent developments in Markov chain theory appear in Aldous & Fill (2002) and Saloff-Coste (1997).

Markov Chain Background
Let X be a finite set and K be a matrix indexed by X ×X . Throughout, π is a stationary distribution for which K is ergodic and reversible: π(x)K(x, y) = π(y)K(y, x). Because of reversibility, K has an orthonormal basis of eigenvectors f i (x) with Here β i is the associated eigenvalue and both f i and β i are real. Arguing as in Diaconis & Saloff-Coste (1993a) we may express the chi-square distance as where the second sum is over all i with β i = 1. If a finite group G acts transitively on X and preserves K then the distance (9) does not depend on the initial state x. Multiply both sides of (9) by π(x) and sum over x. The orthogonality of f i yields For theorem 1, X is the space of matchings, K is defined by (1.1) and π is uniform. The chain is symmetric and so, reversible. The group G is S 2n , the symmetric group on 2n letters. Here σ in S 2n acts on matchings coordinate wise: i 4 ), . . . , (i 2n−1 , i 2n )) = (σ(i 1 ), σ(i 2 )), . . . , (σ(i 2n−1 ), σ(i 2n )).
This is a transitive action so (10) is in force.
Using the Cauchy-Schwartz inequality shows that the chi-square distance is an upper bound for the total variation norm of (4) The bounds (10), (11) and an explicit determination of the eigenvalues constitute the backbone of the proof.

Group Theory Background
For background in representation theory, we recommend Serre (1977) or Diaconis (1988). For characters of the symmetric group see Sagan (2001) or (Macdonald, 1995 where the direct sum is over all partitions λ of n, 2λ = (2λ 1 , 2λ 2 , . . . , 2λ k ) and S 2λ is the associated irreducible representation of the symmetric group S 2n .
This result was used extensively by James (1982James ( , 1968 who credits it to Littlewood or Thrall. A proof appears in (James & Kerber, 1981, page 224), see also Saxl (1981) or Inglis et al. (1990). The final step of preparation relates the matching chain K of (1.1) to the random transposition chain on all of S 2n . Consider the formal sum of all transpositions in S 2n weighted by 1 ( 2n n ) : This operates on L(M n ) by left multiplication. Let T n be the matrix of this as a linear map on L(M n ) with the space having basis the delta functions of the matchings.

Proposition 1
The transition matrix K of (1.1) and T n satisfy Proof This is best seen combinatorially. Operating by T n corresponds to picking a random pair of indices 1 ≤ i < j ≤ 2n and transposing to get a new matching. This fixes a matching if i and j are matched to each other. The chance of this is easily seen to be 1 2n−1 . Deleting these diagonal elements from T n and re-normalizing gives the result.

Corollary 1
The transition matrix K of (1.1) has an eigenvalue β λ for each partition λ = (λ 1 , λ 2 , . . . , λ k ) of n, given by The multiplicity of β λ is determined by µ = 2λ : , with the product being over the cells of the shape µ, and h(i, j) hook length µ i +µ j −i−j +1 , where µ is the transposed diagram.
Proof of Corollary 2.1 If ρ µ denotes the irreducible matrix representation of S 2n corresponding to µ, the Fourier transform of the element T iŝ Using Schur's lemma, this matrix is a constant times the identity: Taking the trace of both sides shows c(µ) = χ µ (1, 2)/χ µ (id) with χ µ the character for this representation. This character ratio was determined by Froebenius. See Ingram (1950) for a modern treatment. This gives For the present application µ = 2λ. SinceT (µ) is diagonal, the multiplicity of this eigenvalue is the dimension of ρ µ , namely χ µ (id). This multiplicity is given by the hook length formula (12) (see Sagan (2001)). Finally, using Proposition 2.1 and simple algebra completes the proof.
Remark Corollary 1 determines the eigenvalues of the matching graph. This has vertices M n and an edge from x to y if they differ by a single switch. There has been a healthy development on the combinatorial side, determining the eigenvalues for a variety of matching 'complexes'; see Wachs (2001) for a survey of closely related results.

Proof of Theorem
We begin by giving a direct proof of the lower bound (3). The lower bound is proved by finding a set A ⊂ M n and x * ∈ M n such that π(A) is large and To produce A, define a function T on matchings by Under the uniform distribution E u (T i ) = 1 2n−1 and E u (T ) = n 2n−1 ∼ 1 2 this T is the analog of the number of fixed points of a permutation. Arguing as in Barbour et al. (1992) a straight-forward use of Stein's method shows that for n large, T has an approximate Poisson ( 1 2 ) distribution. We choose A = {x : T (x) = 0}, whence π(A) ∼ e − 1 2 . To bound K m x * (A) where x * = (1, 2)(3, 4) · · · (2n − 1, 2n), we show that there is a good chance that there is some pair (2i − 1, 2i) which has not been hit in the first m steps. Toward this end, let (I 1 , J 1 ), · · · (I m , J m ) be the transpositions chosen in the first m steps of the walk. For 1 ≤ i ≤ n let Poisson(e −c ) distribution. Again, this can be quantified using Stein's method as in Barbour et al. (1992). Of course if Y ≥ 1 the Markov chain started at x * certainly has (1). Combining bounds, for m = 1 2 n(log n + c), with c < 0, Remark A slight refinement of this argument shows that ||K m x * − π|| −→ 1 if m = 1 2 n(log n − c n ) with c n tending slowly to infinity. This completes the proof for the lower bound.
To prove the upper bound, we use Corollary (1), the bound (11) and equality (10) to see that it suffices to bound For this, we use the following bounds on mult(λ) and β λ : The bound (15) follows from the decomposition of Theorem 2 and Stirling's formula. For (16), we bound the number of standard tableaux Q of shape 2λ with λ 1 = n − j by noting that there are at most 2n 2j ways of picking the elements not in the first row and at most (2j)! 2 j j! ways of arranging these in a standard tableau of shape (2λ 2 , 2λ 3 , . . .) and then there is at most one way to fill in the first row with the remaining elements (c.f. 15). The final inequality in (16) is elementary. The inequalities in (17) are a direct consequence of the monotonicity results of (Diaconis & Shahshahani (1987)): together with a direct computation of the right-hand side of the formula in Corollary (1). Completion of the proof of Theorem 1 The sum in (14) is bounded in three zones: Begin with Zone I, bound using (17). Then (15) gives the sum over Zone I bounded above by ( 2 e n) n = √ 2 exp{−2(log 2)m + n log( 2 e n)} For n ≥ 10, the term in the exponent is bounded above by −2c − .2n(log n + O(1)).
It follows that the sum over Zone I is bounded above by Ae −2c , for universal A, for n ≥ 10.
We show that the sum (20) 1≤j≤θn 2 j n j 2 n j! is uniformly bounded, which will complete the proof. Call the general term t j . For 2 ≤ j ≤ n 2 , the ratios t j+1 t j = 2 j + 1 n 2j+1 n .

Ae −2c
Remark: Of course, for moderate n, the bound (14) can be computed numerically.
For n smaller than 10 6 or so, intermediate bounds (19) and (20) would give explicit, accurate error bounds. The less explicit form ae −2c is simply easier to look at and think about.