Discrepancy convergence for the drunkard's walk on the sphere

We analyze the drunkard's walk on the unit sphere with step size theta and show that the walk converges in order constant/sin^2(theta) steps in the discrepancy metric. This is an application of techniques we develop for bounding the discrepancy of random walks on Gelfand pairs generated by bi-invariant measures. In such cases, Fourier analysis on the acting group admits tractable computations involving spherical functions. We advocate the use of discrepancy as a metric on probabilities for state spaces with isometric group actions.


Introduction
Fix θ ∈ (0, π). Consider the following random walk on the unit sphere S 2 in R 3 , whose steps are geodesic arcs of length θ. (Such arcs subtend an angle of θ at the center of the sphere). The random walk starts at the north pole, and at each step a uniformly random direction is chosen and the walk moves a geodesic distance θ in that direction. We refer to this walk as the drunkard's walk on the sphere.
The purpose of this paper is to develop techniques for bounding the discrepancy metric for random walks on Gelfand pairs, using the drunkard's walk as an example. Our bounds are sharp enough to give a rate of convergence. Let D(k) denote the discrepancy distance (defined later) between the k-th step probability distribution of the drunkard's walk and the uniform (rotation-invariant) measure on S 2 . We show the following: Theorem 1. For the drunkard's walk on the unit sphere S 2 with step size θ, the discrepancy of the walk after k steps satisfies, for k = C sin 2 θ , 0.4330 e −C/2 ≤ D(k) ≤ 4.442 e −C/8 . Thus order C/ sin 2 θ steps are both necessary and sufficient to make the discrepancy distance of this walk from its limiting distribution uniformly small. The result makes intuitive sense, since the number of steps to random should be large when θ is close to 0 or π, and small when θ is close to π/2. Moreover, if θ ≈ 1/n for large n, then this result shows that order n 2 steps are necessary and sufficient; this is similar to nearest-neighbor random walks on Z/nZ (e.g., see [4]). We also note that given θ this walk does not exhibit a sharp cutoff phenomenon.
We frame our analysis in the context of a random walk on a homogeneous space, i.e., a space with a transitive group action. S 2 is a homogeneous space by the action of SO (3). Although the drunkard's walk is not generated by a group action, we show its equivalence with a walk that is.
For random walks on groups, Fourier analysis is often used to obtain rates of convergence. On homogeneous spaces, we can lift the walk to acting group and do Fourier analysis there, although for non-commutative groups the group representations can be quite complicated. However, when the homogeneous space is a Gelfand pair (as in this case), Fourier transforms of bi-invariant measures and functions on the group simplify greatly, allowing for tractable computations involving the spherical functions. In our example, the generating measure on SO(3) can be made bi-invariant, and the spherical functions are Legendre polynomials.
Much of the literature on random walks on Gelfand pairs is limited to discrete homogeneous spaces. Diaconis [4] presents a survey and an annotated bibliography; applications include walks on subspaces of vector spaces over finite fields [11] and walks on r-sets of an n-set [5]. Rates are given in the total variation metric.
Continuous examples have been addressed by Voit, who studied families of isotropic random walks on spheres [23,25] and other homogeneous spaces [24]. Central limit theorems are obtained using convergence in distribution or total variation as the dimension n → ∞. Such results differ from ours in that: (1) we work with a specific walk rather than a family, e.g., we obtain explicit bounds for a specific n rather than asymptotic results for large n, (2) we focus on rates of convergence of the walk on the homogeneous space, rather than convergence of a central limit theorem on the double coset space as in [23,24], and (3) we use the discrepancy metric to measure convergence. We argue that it is a natural metric to use for walks on homogeneous spaces, and develop techniques to bound it. While we only illustrate our methods on the 2-sphere, similar methods can be used to give explicit discrepancy bounds for walks on high-dimensional spheres and other Gelfand pairs.
For walks on continuous groups, we mention the work of Rosenthal [19] and Porod [17,18], who obtain total variation rates of convergence for random walks on SO(n) and other compact groups where the generating measures are conjugate-invariant; this is another situation where the representations simplify enough to get Fourier bounds. This paper is organized as follows. Section 2 gives background on the discrepancy metric and justifies its use over other common metrics on probabilities. Section 3 develops several equivalent formulations for the drunkard's walk. Section 4 gives a formulation with a bi-invariant generating measure. This simplifies the Fourier analysis in Section 5, where matching upper and lower bounds for the convergence rate of the drunkard's walk are derived (Theorems 9 and 10). We summarize our methods for handling random walks on arbitrary Gelfand pairs in Section 6.
For the uninitiated, Appendix A collects relevant background on Fourier analysis on groups, Gelfand pairs, and representations of SO(3) that are needed to make this paper self-contained. Appendix B contains proofs of technical results that are not central to the development of the ideas in this paper.

The discrepancy metric
Let (X, d) be a metric space with metric d. Given any two probability measures P, Q on X, define the discrepancy distance between P and Q by: where a "ball" in X denotes any subset of the form {x : d(x, x 0 ) ≤ r} for some x 0 ∈ X and real number r ≥ 0. It is easy to check that the discrepancy is a metric on probability measures.
When X is the unit cube in R n and Q is Lebesgue measure on X, this definition reduces to the notion of discrepancy commonly used by number theorists to study uniform distribution of sequences in the unit cube (e.g., see [8,16]). Diaconis [4] was perhaps the first to suggest the use of discrepancy to measure rates of convergence of random walks. Su [20,21,22] explored properties of this metric and obtained sharp rates of convergence for certain random walks on the hypercube, circle, and torus.
We shall be concerned with the case where X = S 2 and the metric on S 2 is inherited from its inclusion in R 3 . Thus balls may be visualized as spherical "caps" on the sphere. As noted later, the group of rotations SO(3) acts on S 2 in a natural way and the metric on S 2 is invariant under this action. Thus images of balls under this action are still balls, so the discrepancy metric on measures inherits this rotation invariance.
Unlike the total variation metric (which is in more frequent use among probabilists), the discrepancy metric recognizes both the topology and the group action on the underlying space. For infinite compact state spaces this can be important. For instance, if P is a probability measure on S 2 supported on a finite set of points, and Q is the uniform (rotation-invariant) probability measure, then the total variation distance between P and Q remains equal to 1 no matter how the points are arranged. On the other hand, the discrepancy D(P, Q) will capture how "well-distributed" the points in P are. Another example is a simple random walk on the circle generated by an irrational rotation (see [21]), which converges weak-* to Haar measure; discrepancy captures this convergence, but total variation is blind to it. Thus the discrepancy metric is well-suited to studying random walks on continuous state spaces generated by isometric group actions.
We favor the use of discrepancy over other common metrics (e.g., the Prokhorov, Wasserstein metrics) because there are tractable bounding techniques for discrepancy involving Fourier coefficients. In fact, one of the main goals of this paper is to show that we can develop upper and lower bounds for discrepancy which give sharp rates of convergence in many cases because the dominant terms in each expression match.
We remark that discrepancy bounds can be used to bound other metrics by exploiting known relationships between them [10]; for instance, the discrepancy is bounded above by total variation, so discrepancy lower bounds also offer a way to obtain total variation lower bounds.
A property that will be needed later is that on groups, discrepancy decreases with convolution: Theorem 2. If P, Q, ν are arbitrary probability measures on a compact group G, then D(P * ν, Q * ν) ≤ D(P, Q). Hence when Q = U , the uniform (Haar) measure, we have See [21] for a proof.

The Drunkard's Walk and Equivalent Formulations
The drunkard's walk on the sphere is not a random walk generated by a group action. However, we show in this section that it is equivalent to one that is, in the sense that the two random walks generate the same k-th step probability distribution even though their observed behaviors may appear quite different.
Readers familiar with hypergroups may not be surprised by the equivalence and the ensuing analysis, since the associated double coset space of this walk is a commutative hypergroup, and much of our analysis can be framed in that language. We have avoided it; interested parties are referred to [2].
Let N be the isotropy subgroup of SO(3) fixing n, the north pole. Let E ⊂ SO(3) denote the set of all rotations which fix a point on the equator and move the north pole by geodesic distance θ along the surface of the sphere. Let Q denote the probability distribution supported on E that is left N -invariant. Formulation 1. The Drunkard's Walk. This is the walk considered at the opening of this paper; a drunkard starts at the north pole and at each step picks a uniformly random direction and advances along the sphere in that direction by geodesic distance θ.
Let Y k for k = 0, 1, 2, ... denote random variables which describe the location of the drunkard at time k. Thus Y 0 = n. If g i is an SO(3)-valued random variable with values in E and distribution Q, the position of the drunkard at time k is given by Y k = g 1 g 2 · · · g k n. Thus, this walk is not a random walk in which the next position is generated by applying group actions to the current position in the walk. However, the following random walk is: Formulation 2. The Potted Plant. Consider a potted plant initially at the north pole. At each step, a rotation is chosen randomly from E according to Q and performed on the sphere. Thus the point currently over the north pole moves a distance θ in any direction. This induces a motion of the potted plant, wherever it currently is.
Note that with the given generating set E, the potted plant is moved a geodesic distance less than or equal to θ at every step, since for each rotation in E, n is on the equator of the rotation axis and hence moves the farthest.
If g i is a SO(3)-valued random variable with distribution Q, the position of the potted plant at time k is given by Y k = g k g k−1 · · · g 1 n. Since the g i are independent and identically distributed, this shows that Formulations 1 and 2 are equivalent and generate the same k-th step probability distribution on the sphere. This may be surprising in light of the fact that the steps of the random walk in Formulation 2 are smaller than in Formulation 1.
The next random walk, while not essential in what follows, also generates the same k-th step probability distribution and we mention it for the sake of interest. Formulation 3. Rotate and Spin. Fix any rotation R θ which displaces the current north pole by geodesic distance θ. Start the random walk at the north pole, and at each step perform R θ followed by a uniform spin around the north-south axis. (The uniform spin moves the random walk to a random point anywhere on the same latitude.) Though R θ is not necessarily contained in the set E defined earlier, it does yield the same k-th step probability distribution as the previous formulations. This may be seen as follows.
Consider the double coset space SO(3)//N . Each double coset is characterized by the latitude to which it sends the north pole. Thus R θ = n ′ g 0 n ′′ for some n ′ , n ′′ ∈ N and a g 0 ∈ E. Then E = N g 0 . Let n i denote an N -valued random variable distributed according to Haar measure on N . The walk description shows that at the i-th step, n i n ′ g 0 n ′′ acts on the random walk's current position. Therefore its position at time k is given by Y k = (n k n ′ g 0 n ′′ )(n k−1 n ′ g 0 n ′′ ) · · · (n 1 n ′ g 0 n ′′ )n. Since n i n ′ g 0 and n ′′ n i n ′ g 0 are identically distributed according to Q and n ′′ n = n, the above random variable has the same k-th step distribution as the other formulations above.
Our original goal was to study Formulation 1, the drunkards' walk. Via the above equivalence we choose instead to study Formulation 2, because it is a random walk generated by a group action. However, the generating measure Q, while left N -invariant, is not bi-invariant. In light of Theorem 12, a bi-invariant generating measure would greatly simplify the ensuing Fourier analysis. (See Appendix A for background material on Fourier analysis on compact groups and bi-invariant measures). In the next section, we remedy this problem by introducing a fourth random walk (Formulation 4) which is equivalent to Formulation 2 and whose generating measure is bi-invariant.

A Bi-invariant Formulation
We are interested in the discrepancy distance between the k-th step distribution of the drunkard's walk and U S 2 , the uniform (rotation-invariant) distribution on S 2 . To simplify notation, we write where L(Y k ) denotes the distribution of the random variable Y k in Formulation 2. We investigate the behavior of D(k) as a function of the number of steps k.
Recall that the homogeneous space S 2 can be regarded as the left cosets of N in SO (3), so that the quotient map SO(3) → S 2 sends a rotation g to the point gn ∈ S 2 . A random walk on S 2 generated by an SO(3)-action (such as Formulation 2) may then be regarded as a random walk "upstairs" on SO(3) with an initial distribution U N , Haar measure on N (which is the pre-image of the starting point n). The probability distribution upstairs evolves as usual for a random walk on a group, so that after one step the distribution is given by Q * U N and after k steps by Q * k * U N . The probability of finding the original walk in a ball B ⊂ S 2 is the same as finding the lifted walk on SO(3) inB = BN ⊂ SO(3). Hence where U is Haar measure on SO(3) and the supremum is taken over allB, pre-images of balls under the quotient map SO(3) → S 2 .
At this point we would appeal to Fourier analysis to deal with the convolutions above. However, Q is left N -invariant but not bi-invariant; recall that we desire bi-invariance to simplify the Fourier analysis.
The following proposition shows that for random walks on groups, averaging the generating measure Q to make it bi-invariant will affect the rate of convergence in discrepancy by at most one step. This result is the analogue of a result of Greenhalgh [11], who obtained a similar result for the total variation distance. Proposition 3. Let Q denote any left N -invariant probability measure on a group G, let U and U N denote Haar measure on G and N respectively. If Proof. Left invariance for Q means U N * Q = Q. We use this to establish bi-invariance forQ, which means U N * Q * U N =Q. This follows from For the second assertion, note that , which with the above equations yield the desired conclusion.
Thus a random walk on a group with generating measure Q differs by no more than one step from a random walk proceeding according toQ, which may be viewed as the average of the measure Q over the left cosets of N .
However, for a random walk on a homogeneous space, even more can be said if the walk begins at the point fixed by the isotropy subgroup: Proposition 4. Suppose X be a homogeneous G-space with isotropy subgroup N fixing x 0 ∈ X, and Q is a left-invariant probability on G. Let D(k) denote the discrepancy of the random walk starting at x 0 and evolving via a group action with elements chosen according to Q. LetD(k) denote the discrepancy of the random walk starting at x 0 , but evolving according toQ. ThenD Proof. This follows from the fact shown in the previous proof, that The right side, when regarded as a measure on S 2 , describes the location of the Q-generated walk. But by the right invariance ofQ, the left side is equal toQ * k * U N , which when regarded as a measure on S 2 , describes the location of theQ-generated walk.
This shows that the following is equivalent to Formulation 2. Observe that we are able to throw extra rotations in "for free" and still obtain the same k-th step probability distribution. This may be surprising because with the extra generating elements the step size of the potted plant is no longer bounded by θ, as it was in Formulation 2. In fact, the potted plant could be moved around rather wildly at each step.
Exploiting this equivalence, we shall, in the sequel, work with Formulation 4. To save notation we write Q for the bi-invariant measureQ. Rightinvariance for Q yields Q = Q * U N , which when substituted into (3) gives where the supremum is taken over all ball pre-imagesB. Hence, the discrepancy D(k) as defined in (2) can now be analyzed using expression (4).

A Rate of Convergence
We now proceed to derive a rate of convergence for the drunkard's walk on the sphere. Several calculations require the facts reviewed in Appendices A and B; we alert the reader with references.
Let B y,r denote a ball of geodesic radius r centered at y ∈ S 2 . Such balls look like spherical "caps" on S 2 . LetB y,r denote its pre-image "upstairs" in SO(3). To reduce notation, writeB r =B n,r for the pre-image of a ball centered around n. Let δ r denote the indicator function ofB r on SO (3).
From (4) we have We wish to use Fourier inversion to derive bounds for these expressions in terms of the Fourier coefficients. We need continuity of Q * k * δ r for k ≥ 2: Proposition 5. Let Q be defined as in Formulation 4, and let δ r be denote the indicator function ofB r . Then Q * k * δ r is continuous for k ≥ 2. This is proved in Appendix B. Hereafter, assume k ≥ 2. We shall also assume for the moment that Q * k * δ r has an absolutely convergent Fourier series, which will be verified later in the course of our computations. Since Q * k * δ r is a continuous function for k ≥ 2, it is exactly equal to its Fourier series (Theorem 11), so that from (6) and (16) we have where ρ n is the irreducible representation of SO(3) of dimension (2n + 1). The trivial representation ρ 0 does not appear here since it was cancelled in (6) by U (B y,r ).
Remark 6. Since Q and δ r are both N -bi-invariant on SO(3), by Theorem 12 there is a basis for the representations such that their transforms are identically zero except in the (1, 1)-th entry. Any such basis (e.g., the spherical harmonics) has its first basis element given by the Legendre polynomials, which are the spherical functions for the Gelfand pair (SO(3), N ).
Hence ρ n (ỹ) (1,1) = P n (cos γ), where γ is the geodesic distance of y from n. Since the product of the transforms of Q k and δ r are identically zero except for the (1, 1)-th element, the only diagonal element changed by multiplication by ρ n (ỹ) is the (1, 1)-th entry. Hence the trace (7) reduces to where the second inequality follows from (22). Notice that the sum in (8) is precisely the sum in Theorem 11 that needs to be checked for convergence in verifying that Q * k * δ r has an absolutely convergent Fourier series. Hence when we bound the above expression we will also have validated our use of Fourier inversion in our computations.
From (20), we have Q(ρ n ) (1,1) = P n (cos θ) since P n is constant on the support of Q. Also, for a ball B r of geodesic radius r and n ≥ 1, formula (21) gives The integral of P n follows from (23) and noting that P n (1) = 1, and the inequality follows from (22).
Substitution of (9) and (11) into (8) yields To bound the Legendre polynomials, we use the following well-known bound (see Jackson [14, p.63]): Proposition 7. For P n , the n-th Legendre polynomial, and any θ, We derive an alternate bound, suitable for small θ: Proposition 8. For P n , the n-th Legendre polynomial, and n sin 2 θ ≤ .9, This bound is better than Proposition 7 when n sin 2 θ < 2 − 4 − 8 π ≈ .794. It is proved in Appendix B. Using Propositions 7 and 8 and the bound 1 − x ≤ e −x , the sum in (12) can be estimated: Note that 2 πB 1/2 < e −1/8 < e − sin 2 θ/8 . For k = C sin 2 θ and C ≥ 4, one sees that (k − 2) sin 2 θ ≥ 2 and k ≥ 4, so that The above bound, together with (12), proves the following theorem. (Note that the C ≥ 4 restriction above is not needed below because the discrepancy D(k) never exceeds 1.) Theorem 9. For the drunkard's walk on the sphere with step size θ, the discrepancy after k steps satisfies, for k = C sin 2 θ , D(k) ≤ 4.442 e −C/8 . Thus order C sin 2 θ steps are sufficient to make the discrepancy uniformly small. The following lower bound confirms the order is correct.
Theorem 10. For the drunkard's walk on the sphere with step size θ, the discrepancy after k steps satisfies, for k ≥ 2, For k = C sin 2 θ , we have Thus order C sin 2 θ steps are needed to make the discrepancy distance uniformly small. Together, Theorems 9 and 10 prove Theorem 1.
One way to obtain a lower bound for discrepancy is to evaluate the difference of Q * k and U on well-chosen ball. The same idea can be used for the total variation; one way to choose such a ball (see [4, p.29]) is to take a set cut out by a random variable consisting of the dominant terms in the Fourier series of Q * k . The mean and variance of the random variable and an appeal to Chebyshev's inequality yield an estimate for Q * k on that set.
However, the proof of Theorem 10 illustrates a different approach using ideas similar to those used in [22] for bounds on the torus. We construct a "local discrepancy" function which at each point evaluates the discrepancy of the measure on a set of geodesic radius r centered at that point. The function is bounded above by the total discrepancy. As before, it can be rewritten in terms of a convolution of the original measure and the indicator function of the set. An appeal to Plancherel's identity gives a sum with only non-negative terms, so the dominant term can be pulled out as a lower bound for discrepancy.
We remark that since discrepancy is a lower bound for total variation, this lower bounding technique can also be used to obtain lower bounds for random walks under total variation.
Proof. Define, for g ∈ SO(3), where y is the image of g under the quotient map from SO(3) to S 2 . From (4), we see that ∆ r (x) ≤ D(k), and hence for all r, where the * denotes the conjugate transpose (here only). Notice that ∆ r may be rewritten as: For n = 0, a trivial computation shows U (ρ n ) = 0, and thus ∆ r (ρ n ) = Q k (ρ n ) δB r (ρ n ). Remark 6 and the computations from Equations (9) and (10), when substituted into (14), and combined with (13), give where r may be chosen arbitrarily. Taking only the dominant term (n = 1) in the above expression, and letting r = π/2, we have cos r = 0, P 0 (0) = 1, P 2 (0) = −1/2, and P 1 (x) = x. It follows that as was to be shown. The second inequality in the theorem follows from | cos θ| k = e k 2 ln cos 2 θ ≥ e − k 2 sin 2 θ , using the fact that ln(1 − x) ≥ −x for all x.
For a tighter lower bound, one may use more terms in (15) and adjust the choice of r; however, the dominant term sufficed to obtain matching upper and lower bounds for this random walk.

Conclusion
A similar analysis can be carried out for the discrepancy convergence of any random walk on a Gelfand pair, when the spherical functions are known. Proposition 4 shows that making a generating measure bi-invariant will not affect the rate of convergence. The upper bound is obtained via (5), Fourier inversion to yield (12), and bounds on the appropriate spherical function (e.g., Prop. 8). The lower bound is obtained via Plancherel's identity applied to the square of the local discrepancy function, e.g., equations (13) and (14), then choosing as many terms as needed.
We remark that our pair of strategies often works well for obtaining matching upper and lower bounds because if there is a dominant Fourier coefficient, it appears to the same order in both upper and lower bounds. In our example, Q(1) was the dominant term; compare the upper bound (12) and lower bound (15). See [12,21,22] for more examples of this phenomenon in discrepancy bounds for random walks on groups.

Appendix A.
This appendix reviews material on harmonic analysis, homogeneous spaces, Gelfand pairs, representations of SO(3), and Legendre polynomials.
Fourier Analysis on a Compact Group. A standard reference is the encyclopedic account by Hewitt and Ross [13]. Diaconis [4] gives a concise introduction to Fourier analysis on finite groups. Dym and McKean [9] is a readable introduction to Fourier series on SO (3).
We assume henceforth that all compact groups are separable and metrizable. For any compact group G there is a unique measure µ on G, called (normalized) Haar measure, such that µ is G-invariant and µ(G) = 1.
Let V be a finite dimensional vector space over C, the complex numbers. Recall that a representation of a group G on V is a homomorphism ρ : G → GL(V ). If V has dimension n, then ρ is said to have dimension n. A basis for V can be chosen so that the image of ρ with respect to this basis are unitary matrices. If there is no non-trivial subspace of V invariant under the action of G, then ρ is said to be irreducible; otherwise ρ decomposes as a direct sum of irreducible representations. (One can similarly define a representation ρ of G on a Hilbert space, though if G is compact ρ decomposes into a direct sum of unitary representations of finite dimension.) Two representations ρ on V and ρ ′ on V ′ are equivalent if there is an isomorphism τ : V → V ′ such that τ • ρ = ρ ′ • τ . Let Σ denote a the set of equivalence classes of irreducible representations of G. For a compact group, Σ is countable and furthermore, all the irreducible representations are finite dimensional.
Definition 1. The Fourier transform of a complex-valued function f on a compact group G at a representation ρ of G is defined by Similarly, the Fourier transform of a measure ν on G at ρ is defined by We show how a function may be recovered from its Fourier transforms at irreducible representations. Let d ρ denote the dimension of a representation ρ. For any operator A, let T r[A] denote the trace of A, and let A ϕ 1 denote the sum of the eigenvalues of the operator square root of AA * . (Here, * denotes conjugate transpose.) Definition 2. For any f ∈ L 1 (G, µ), the series is called the Fourier series of f . (There is mild abuse of notation here: by ρ ∈ Σ, we really mean to choose a representative ρ from each class of irreducible representations in Σ.) If then f is said to have an absolutely convergent Fourier series [13, (34.4)].
Theorem 11 (Fourier inversion). If a function f on G has an absolutely convergent Fourier series, then the Fourier series of f (g) converges uniformly to a continuous functionf (g), and f (g) =f (g) almost everywhere on G with respect to Haar measure µ.
Proof. This theorem is embedded in Hewitt and Ross [13], but obscured by their exotic notation. We briefly indicate how to "prove" this theorem from results cited in [13].
The set of functions with absolutely convergent Fourier series is denoted in [13] by a symbol that resembles R(G), defined in (34.4). Theorem (34.6) in [13] shows that any f ∈ R(G) is equal almost everywhere to its Fourier series. Theorem (34.5.ii) shows that this Fourier series converges uniformly to a continuous function that we have denotedf .
We remark that since the notation in Hewitt and Ross [13] is cumbersome and tough to wade through, for the sake of probabilists we have simplified it by following the notation of Diaconis [4]. To aid the reader wishing to follow the results quoted above, we provide a "dictionary" between the two sets of notation: in Hewitt and Ross [13, (27.3)], σ denotes a class of equivalent irreducible representations in Σ and U is a representative of that class; we avoid reference to σ (to eliminate an unnecessary layer of notation) and use ρ instead of U . Hewitt and Ross denote an arbitrary element of a group G by x ∈ G; we use g ∈ G. Their notations A σ and U (σ) x refer to operators that correspond to our f (ρ) and ρ(g), respectively (see [13, (34 Note that if f is continuous, Theorem 11 implies that if f has an absolutely convergent Fourier series, it equals its Fourier series at every point.
Homogeneous Spaces and Gelfand Pairs. Diaconis [4, Chap. 3F] provides an introduction to Gelfand pairs on finite groups and an annotated bibliography. Dieudonne [7] is a concise introduction to Gelfand pairs on compact and locally compact groups.
Definition 3. Let G be a compact group and X be a topological space. An action of G on X is a continuous mapping from G × X → X denoted by (s, x) → s · x = sx such that id · x = x and s · (t · x) = (st) · x.
If G acts transitively on X, that is, if for any x, y ∈ X there exists an s such that sx = y, we call X a homogeneous space.
Given a point x 0 ∈ X, let N denote the isotropy subgroup of G with respect to x 0 , i.e., the set of group elements which fix x 0 . By construction, N is a closed subset of G. The canonical isomorphism of X onto G/N , the left cosets of N , respects the action of G. Thus g : xN → (gx)N .
Let µ X denote the G-invariant measure on X induced by Haar measure on G. Let L 2 (X) denote the space of all complex-valued square-integrable functions on X with respect to µ X . The action of G on X induces an action of G on L 2 (X) by g · f (x) = f (g −1 x). This action is a 1-to-1 linear mapping of the vector space L 2 (X) into itself and so defines a representation of G.
In this paper bi-invariance on a homogeneous space G will be understood to mean with respect to the isotropy subgroup N . Note that bi-invariant functions on G are constant on double cosets N gN and may therefore be viewed as functions on the double coset space (denoted G//N ), or as leftinvariant functions on X via its isomorphism with G/N . Theorem 12. If (G, N ) is a Gelfand pair, then for every irreducible representation ρ : G → GL(V ) there is a basis of V such that for all functions f (resp. measures ν) bi-invariant with respect to N , the Fourier transform f (ρ) (resp. ν(ρ)) in that basis contains only zeroes except possibly for the (1, 1)-th entry.
Proof. Dieudonne [6, (22.5.6)] shows the algebra L 2 (G//N ) is commutative if and only if the number of times the trivial representation appears in ρ| N , the the restriction of ρ to N , is zero or one.
If one, this trivial representation corresponds to a one-dimensional subspace of V fixed by N , i.e., the left N -invariant functions on X; choose the unique function s(x) on X normalized so that s(x 0 ) = 1. This is sometimes called the spherical function of (G, N ) corresponding to the representation ρ. Complete s to a basis for V so that the matrices of ρ| N break into irreducible "blocks". Then for a right N -invariant function f : where µ N is Haar measure on the subgroup N . The second equality is obtained by choosing a coset representative x from each coset in G/N and expressing g = xn for some x and n ∈ N , and noting that Haar measure µ decomposes as a product measure µ X · µ N . A similar argument holds for a right-invariant measure ν, noting that ν decomposes as a product measure ν X · µ N because of right-invariance.
By the orthogonality relations for matrix entries [6, (21.2.5.c)] of irreducible representations of N , the left-most integral of (18) produces a matrix consisting of zeroes except possibly for the (1, 1)-th entry. Thus f (ρ) (resp. ν(ρ)) has zero entries except possibly for the first row. A similar argument using the left-invariance of f (resp. ν) shows that f (ρ) (resp. ν(ρ)) has zero entries except possibly for the first column. Together, these statements imply that the only entry that could possibly be non-zero is the (1, 1)-th entry.
If the trivial representation does not appear in the the restriction of ρ to N , the argument above holds by ignoring the role of s when choosing a basis for V . Orthogonality then shows that the left-most integral of (18) yields a zero matrix.
There is thus one spherical function s i (x) for every irreducible representation ρ i appearing in L 2 (X). These induce N -bi-invariant functionss i on G. In the theorem above thes i (g) appears as the (1, 1)-th entry of ρ i (g) for an appropriate basis. Hence for any measurable function f on G, the (1, 1)-th entry of the Fourier transform at ρ i satisfies Similarly, for a measure ν on G, Dieudonne [7] is a readable introduction to the general theory of spherical functions; Letac [15] computes them in several examples.
The sphere as a Gelfand pair. The rotation group SO(3) acts on the unit sphere S 2 by the natural inclusion of S 2 in R 3 . This action is clearly transitive on S 2 , so S 2 is a homogeneous space. In fact arises from the Gelfand pair (G, N ), where G is the rotation group SO (3), and N is the isotropy subgroup of rotations fixing n, the north pole. By restriction to the plane orthogonal to n ∈ R 3 , we see that N is isomorphic to the group SO(2). The sphere S 2 may then be regarded as the space SO(3)/SO (2). In fact, for all n ≥ 2, Dieudonne [7] shows that S n ∼ = SO(n + 1)/SO(n) is a Gelfand pair. (3). For a good reference on representations of SO(3) and other compact Lie groups, see Brocker and tom Dieck [3].

Representations of SO
be the Laplace operator on R 3 . The harmonic polynomials are the set of all complex-valued homogeneous polynomials f in x 1 , x 2 , x 3 of degree n such that ∆f = 0; the restrictions of these functions to the sphere S 2 form a set V n , the spherical harmonics of degree n (one of which is the spherical function s n ).
The action of SO(3) on V n is induced by its action on R 3 in the manner described earlier: g · f (x) = f (g −1 x). Moreover, V n is irreducible and finitedimensional, and every irreducible representation of SO(3) arises in this way. The dimension of V n is 2n + 1.
Legendre polynomials. The spherical functions s i on S 2 are given by the well-known Legendre polynomials P i in the following way: for y ∈ S 2 , s i (y) = P i (x) where x = cos θ y ∈ [−1, 1] and θ y is the geodesic distance between y and n on S 2 . Just as S 2 is (isomorphic to) the left cosets of N in G, the set [−1, 1] is the double coset space of this Gelfand pair.
Since Haar measure on SO(3) induces the uniform (rotation-invariant) probability measure on S 2 and uniform probability measure on [−1, 1], we can compute (19) as where µ denotes normalized Haar measure on SO(3), dy denotes the uniform measure on S 2 , and dx is Lebesgue measure on R. See [9, p.239].

Appendix B.
This appendix contains the proofs of some technical results (Proposition 5 and Proposition 8) that are not central to the development above.
Proof of Proposition 5. To show that Q * k * δ r (x) is continuous for k ≥ 2, we first require the following technical lemma.
Lemma 13. Let ν be a positive measure on a compact metric group G, and let f be any measurable, bounded function f with discontinuities on a set D f . Let xD −1 f denote the set {xd −1 : d ∈ D f }. Given x, if ν(xD −1 f ) = 0, then the convolution is continuous at x.
Proof. To show h(x) is continuous at x, consider any sequence x n ∈ G such that x n → x. It must be shown that h(x n ) → h(x).
Let w n (z) = f (z −1 x n ) and w(z) = f (z −1 x). Since f is bounded, all the w n and w, being translates of f , are uniformly bounded by some constant function. This constant function is in L 1 (G, ν), since G is compact.
Also, w n (z) → w(z) pointwise for all z ∈ xD −1 f , since f is continuous at those points. By the assumption on D f we have pointwise convergence almost everywhere. By Lebesgue's dominated convergence theorem, G w n (z) dν(z) → G w(z) dν(z), which is precisely the statement h(x n ) → h(x). This completes the proof of Lemma 13.
We can now prove Proposition 5.
Proof. We apply Lemma 13 setting f (x) = δ r (x) and ν = Q. By inspection, δ r is bounded by 1. The lemma implies Q * δ r (x) is continuous everywhere except possibly at x = id. This may be seen by observing that the discontinuity set of δ r is ∂B r , the boundary ofB r . This is the pre-image of a circle on S 2 . On the other hand, Q regarded as a measure on S 2 is uniformly supported on a circle at latitude θ from the north pole. Any two circles on S 2 intersect in at most two points, unless they are identical. Hence Q(∂B r ) = 0 unless the support of Q intersects ∂B r , which only occurs whenB r =B n,θ . This corresponds to a discontinuity in Q * δ r (x) at x = id when r = θ.
We now apply Lemma 13 again to show that Q * k * δ r (x) is continuous for k = 2. The preceding observations show that Q * δ r (x) is continuous almost everywhere (except possibly at the identity which is not in the support of Q). It is bounded by 1. Applying the lemma for f (x) = Q * δ r (x) and ν = Q shows that Q * 2 * δ r (x) is continuous everywhere. Now proceed by induction on k. For k ≥ 3, let ν = Q and let f (x) = Q * (k−1) * δ r (x), which is continuous. Then Lemma 13 shows that ν * f (x) = Q * k * δ r (x) is continuous.
Proof of Proposition 8. This proves a Legendre bound for small θ.