ASYMPTOTIC DISTRIBUTION OF COORDINATES ON HIGH DIMENSIONAL SPHERES

The coordinates x i of a point x = ( x 1 , x 2 , . . . , x n ) chosen at random according to a uniform distribution on the ℓ 2 ( n )-sphere of radius n 1 / 2 have approximately a normal distribution when n is large. The coordinates x i of points uniformly distributed on the ℓ 1 ( n )-sphere of radius n have approximately a double exponential distribution. In these and all the ℓ p ( n ) , 1 ≤ p ≤ ∞ , convergence of the distribution of coordinates as the dimension n increases is at the rate √ n and is described precisely in terms of weak convergence of a normalized empirical process to a limiting Gaussian process, the sum of a Brownian bridge and a simple normal process.


Introduction
If Y n = (Y 1n , . . . , Y nn ) is chosen according to a uniform distribution on the sphere in n dimensions of radius √ n then, computing the ratio of the surface area of a polar cap to the whole sphere, one finds that the marginal probability density of Y jn / √ n is f n (s) = κ n (1 − s 2 ) (n−3)/2 I (−1,1) (s), . Stirling's approximation shows so appealing to Scheffe's theorem (see [3]) one has and Y jn is asymptotically standard normal as the dimension increases. This is an elementary aspect of a more comprehensive result attributed to Poincare; that the joint distribution of the first k coordinates of a vector uniformly distributed on the sphere S 2,n ( √ n) is asymptotically that of k independent normals as the dimension increases. Extensions have been made by Diaconis and Freedman [9], Rachev and Ruschendorff [13], and Stam in [16] to convergence in variation norm allowing also k to grow with n. In [9] the authors study k = o(n) and relate some history of the problem. Attribution of the result to Poincare was not supported by their investigations; the first reference to the theorem on convergence of the first k coordinates they found was in the work of Borel [4]. Borel's interest, like ours, centers on the empiric distribution (edf) F n (t) = #{Y in ≤ t : i = 1, . . . , n} n . (1) The proportion of coordinates Y jn less than or equal to t ∈ (−∞, ∞) is F n (t). As pointed out in [9], answers to Borel's questions about Maxwell's theorem are easy using modern methods. If Z 1 , Z 2 , . . . are iid N (0, 1) and R n = 1 n n i=1 Z 2 i then it is well known that R −1/2 n (Z 1 , . . . , Z n ) is uniform on S 2,n (n 1/2 ), so if the edf of Z 1 , Z 2 , . . . , Z n is G n then since nG n (t) is binomial, the weak law of large numbers shows that G n (t) p → Φ(t). By continuity of square-root and Φ and R n p → 1 it follows, as indicated, that the right-most term of the right hand side converges to 0 in probability. Finally, by the Glivenko-Cantelli lemma (see equation (13.3) of [3]) it follows that the left-most term on the right hand side tends to zero in probability. The argument yields asymptotic normality and, assuming continuity, an affirmative answer to the classical statistical mechanical question of equivalence of ensembles: does one have equality of the expectations E G [k(Y )] = k(y)dG(y) and E U [k(Y )] = k(y)dU (y) where, corresponding to the micro-canonical ensemble, U is the uniform distribution on {y : H(y) = c 2 }, and G is the Gibbs' distribution satisfying dG(y) = e −aH(y) dy with a such that E G [H(Y )] = H(y)dG(y) = c 2 , and H(y) the Hamiltonian? For H(x) = cx 2 , if the functional g k (F ) = k(y)dF (y) is continuous, then the two are equivalent modulo the choice of constants.
More generally, what can be said about the error in approximating the functional g(F )'s value by g(F n )? In the case of independence there are ready answers to questions about the rate of convergence and the form of the error; for the edf Q n determined from n independent and identically distributed univariate observations from Q, it is well known that the empiric process D n (t) = √ n(Q n (t)−Q(t)), t ∈ (−∞, ∞), converges weakly (D n ⇒ B•Q) to a Gaussian process as the sample size n increases. Here B is a Brownian bridge and it is seen that the rate of convergence is √ n with a Gaussian error. If the functional g is differentiable (see Serfling [15]), then √ n(g(Q n )−g(Q)) ⇒ Dg(L), where Dg is the differential of g and L = B•Q is the limiting error process. The key question in the case of coordinates constrained to the sphere is: does the process √ n(F n (t) − Φ(t)) converge weakly to a Gaussian process? The answer will be shown here to be yes as will the answers to the analogous questions in each of the spaces ℓ p (n) if Φ is replaced in each case by an appropriate distribution. Even though the random variables are dependent, convergence to a Gaussian process will occur at the rate √ n. The limiting stochastic process L(t) = B(F p (t)) + tfp(t) √ p Z differs from the limit in the iid case. To state our result, for 1 ≤ p < ∞, let 1 p + 1 q = 1 and introduce the family of distributions F p on (−∞, ∞) whose probability densities with respect to Lebesgue measure are The space ℓ p (n) is R n with the norm x p = ( n j=1 |x j | p ) 1/p where x = (x 1 , . . . , x n ). The sphere of "radius" r is S p,n (r) = {x ∈ R n : x p = r}. The ball of radius r is B p,n (r) = {x ∈ R n : x p ≤ r}. The convergence indicated by D n ⇒ D is so-called weak convergence of probability measures defined by lim n→∞ E[h(D n )] = E[h(D)] for all bounded continuous h and studied in, for example, [3]. The following will be proven, where uniformly distributed in the statement refers to σ p,n defined in section 3. Theorem 1. Let p ∈ [1, ∞) and Y n = (Y 1n , . . . , Y nn ) be uniformly distributed according to σ p,n on the sphere S p,n (n 1/p ). There is a probability space on which are defined a Brownian bridge B and a standard normal random variable Z so that if F n is as defined in (1) then as n → ∞, where the indicated sum on the right hand side is a Gaussian process and cov(B(F p (t)), Z) = −tf p (t).
2 Idea of the proof of the theorem Let X n = (X 1 , . . . , X n ) where {X 1 , X 2 , . . . } are iid F p random variables. Then the uniform random vector Y n on the n-sphere of radius n 1/p has the same distribution as n 1/p Xn Xn p . Let and G n be the usual empirical distribution formed from the n iid random variables {X i } n i=1 . Then the process of interest concerning (1) can be expressed probabilistically as √ n(F n (t) − F p (t)) d = √ n((G n (tψ p (X n )) − F p (tψ p (X n ))) + (F p (tψ p (X n )) − F p (t))).
It is well known that the process √ n(G n (t) − F p (t)) converges weakly to B(F p (t)), where B is a Brownian bridge process. Noting that ψ p (X n ) p → 1 as n → ∞ and that a simple Taylor's expansion of the second term yields that √ n(F p (tψ p (X n )) − F p (t)) converges weakly to the simple process where V is a standard normal random variable, it can be seen that the process in question, the empirical process based on an observation uniform on the n 1/p -sphere in ℓ p (n), the emspherical process defined by the left hand side of (5), converges weakly to a zero mean Gaussian process as the dimension n increases. The covariance of the two Gaussian summands will be shown to be Details of the uniform distribution σ p,n of Theorem 1 on the spheres in ℓ p (n) are given next.

Uniform distribution and F p
The measure σ p,n of Theorem 1 assigns to measurable subsets of S p,n (1) their Minkowski surface area, an intrinsic area in that it depends on geodesic distances on the surface. See [6]. The measure σ p,n coincides on S p,n (1), with measures which have appeared in the literature (see [2], [13], and [14]) in conjunction with the densities f p . In particular, it is shown that it coincides with the measure µ p,n defined below (see (11)) which arose for Rachev and Ruschendorf [13] in the disintegration of V n .

The isoperimetric problem and solution
Let K ⊂ R n be a centrally symmetric closed bounded convex set with 0 as an internal point.
The only reasonable (Busemann [6]) n-dimensional volume measure in this Minkowski space is translation invariant and must coincide with the (Lebesgue) volume measure V n . One choice for surface area is the Minkowski surface area σ K , defined for smooth convex bodies D by For a more general class of sets M (see, for example, equation (18) of [11] for details) the Minkowski surface area can be shown to satisfy where σ 2 is Euclidean surface area, u is the (Euclidean) unit normal to the surface ∂M, and · K 0 is the norm in the dual space, also a Minkowski normed space in which the unit ball is the polar reciprocal K 0 = {x * ∈ R n :< x * , x >≤ 1∀x ∈ K} of K. Here < x, y >= n i=1 x i y i . It follows from the work of Busemann [7] that among all solids M for which the left hand side of (7) is fixed, the solid maximizing the volume V n is the polar reciprocal C 0 of the set C of points u u K 0 . The latter is the unit sphere S K 0 (1) of the dual space (see also [8]). It follows from (∂K 0 ) 0 = K that C 0 = B K (1) = K, the unit ball. This solution also agrees in the case of smooth convex sets with that from Minkowski's first inequality (see (15) of [11]); the solution is the unit ball B K (1). In the case of interest here ℓ p (n), 1 ≤ p < ∞; take K = B p,n (1) and denote σ K by σ p . For the sphere S p,n (r) the Minkowski surface area satisfies By homogeneity V n (B p,n (r)) = r n V n (B p,n (1)) so one has σ p (S p,n (r)) = V n (B p,n (1)) dr n dr . By a formula due to Dirichlet (see [1] . The simple formula (8) for σ p (S p,n (r)) should be contrasted with the Euclidean surface area σ 2 (S p,n (r)) for which there is no simple closed form. See [5].

Disintegration of V n and Minkowski surface area
If f is smooth and D = {x : f (x) ≤ c} is a compact convex centrally symmetric set with 0 as an internal point and if g is a measurable function on ∂D then by (7) and For r > 0 fixed, define the mapping T r by e 1,2,...,n−1 + c 1 e n,2,3,...,n−1 + c 2 e 1,n,3,...,n−1 + · · · + c n−1 e 1,2,...,n−2,n , it is seen that From (10) and (9) it follows that the measure σ p,n coincides with Rachev and Ruschendorf 's [13] measure µ p,n defined (see their equation (3.1)) on the portion of S p,n (1) with all v i > 0 and analogously elsewhere by where , and A is any measurable subset of S p,n (1).

Minkowski uniformity under F p
The probability P is uniform with respect to µ if P is absolutely continuous with respect to µ and the R-N derivative f = dP dµ is constant. The probability measure P is uniform on the sphere S p,n (1) if f is constant and the measure µ is surface area. If X 1 , . . . , X n are iid F p and then n 1/p R is distributed uniformly with respect to Minkowski surface area on the sphere S p,n (n 1/p ). This follows from the literature and our calculations above but for a self contained proof consider for g : In particular, if f is the joint density of X 1 , . . . , X n with respect to V n and M is a measurable subset of S p,n (1), then letting A = R −1 (M ), one has the probability Therefore, if X 1 , . . . , X n are iid F p and R is given in (12), then the density of R is uniform with respect to σ p,n .
4 Proof of the theorem for ℓ p (n), 1 ≤ p < ∞ The techniques of Billingsley [3] on weak convergence of probability measures and uniform integrability will be employed to prove Theorem 1. Let (Ω, A, P ) denote a probability space on which is defined the sequence U j ∼ U(0, 1), j = 1, 2, . . . of independent random variables, identically distributed uniformly on the unit interval. Fixing p ∈ [1, ∞), one has that the iid F p -distributed sequence of random variables X 1 , X 2 , . . . can be expressed as X j = F −1 p (U j ). The usual empirical distribution based on the iid X j is then where U n is the empirical distribution, edf, of the iid uniforms. Suppressing the dependence on ω ∈ Ω for both, define for each n = 1, 2, . . . the empirical process ∆ n (u) = √ n(U n (u) − u) for u ∈ [0, 1] and (see also (4)) The metric d 0 of [3] ( see Theorem 14.2) on D[0, 1] is employed. It is equivalent to the Skorohod metric generating the same sigma field D and D[0, 1] is a complete separable metric space under d 0 .
The processes of basic interest are √ n(F n (t) − F p (t)), t ∈ (−∞, ∞). As commonly utilized in the literature, the alternative parametrization relative to u ∈ [0, 1] is sometimes adopted below in terms of which the basic process is expressed as In terms of this parametrization the processes concerning us are E n (u) = √ n(G n (F −1 p (u)ψ p (X n ))− F p (F −1 p (u))); these generate the same measures on (D[0, 1], D) as the processes (13). Weak convergence of the processes E n will be proven. Introduce for c > 0 the mappings φ(c, ·) defined by φ(c, u) = F p (cF −1 p (u)), 0 < u < 1, φ(c, 1) = 1, and φ(c, 0) = 0. Then if and one observes that E n (u) = E (1) n (u) + E (2) n (u). The following concerning product spaces will be used repeatedly. Take the metric d on the product space M 1 × M 2 , as d((x 1 , y 1 ), (x 2 , y 2 )) = max{d 1 (x 1 , x 2 ), d 2 (y 1 , y 2 )}, (16) where d i is the metric on M i . Proposition 1. If (X n (ω), Y n (ω)) are (Ω, A, P ) to (M 1 × M 2 , M 1 × M 2 ) measurable random elements in a product M 1 × M 2 of two complete separable metric spaces then weak convergence of X n ⇒ X and Y n ⇒ Y entails relative sequential compactness of the measures ν n (·) = P [(X n , Y n ) ∈ ·] on (M 1 × M 2 , M 1 × M 2 ) with respect to weak convergence.
Proof : By assumption and Prohorov's theorem (see Theorem 6.2 of [3]) it follows that the sequences of marginal measures ν X n , ν Y n are both tight. Let ǫ > 0 be arbitrary, K X ∈ M 1 be compact and satisfy P [ω ∈ Ω : X n (ω) ∈ K X ] ≥ 1 − ǫ/2 for all n and K Y ∈ M 2 compact be such that P [ω ∈ Ω : Y n (ω) ∈ K Y ] ≥ 1 − ǫ/2 for all n. Then K X × K Y ∈ M 1 × M 2 is compact (since it is is clearly complete and totally bounded under the metric (16) when -as they do here -those properties of the sets K X and K Y hold) and since Thus the sequence of measures ν n is tight and by Prohorov's theorem (see Theorem 6.1 of [3]) it follows that there is a probability measureν on (M 1 × M 2 , M 1 × M 2 ) and a subsequence n ′ so that ν n ′ ⇒ν. It is shown next (see (5)) that √ n(G n (tψ p (X n )) − F p (tψ p (X n ))) ⇒ B(F p (t)).
The fact that if sup n E[|V n | 1+ǫ ] < ∞ for some ǫ > 0 then {V n } is ui will be employed as will Theorem 5.4 of [3] which states that if {V n } is ui, and V n ⇒ V then lim n→∞ E[V n ] = E[V ]. It is well known that in a Hilbert space (L 2 (Ω, A, P ) here) a set is weakly sequentially compact if and only if it is bounded and weakly closed (see Theorem 4.10.8 of [10]). In the following it is more convenient to deal with the original X j . It is assumed, without loss of generality and for ease of notation, that the subsequence is the original n so µ n ⇒μ.
Proof : Fix t ∈ (−∞, ∞) and let C n = √ n(G n (t) − F p (t)) and D n = √ n(W n − 1), where W n = 1 n n j=1 |X j | p . The expectations E[|C n D n | 2 ] will be computed and it will be shown that the supremum over n is finite. In particular, it will be demonstrated that E[C 2 n D 2 n ] = n −2 (K 1 n 2 + K 2 n) so that C n D n is ui.
. . , n that A's for different indexes are independent and the same applies to B's. Furthermore, n is the sum of four terms S 1 , S 2 , S 3 , S 4 where Consider first S 2 . A typical term in the expansion will be Only the ones for which i equals u or v have expectations possibly differing from 0, but if i = u then since B v is independent and 0 mean it too has expectation 0. Thus E[S 2 ] = 0. The same argument applies to E[S 3 ]. In S 4 we'll have, using similar arguments, In the case of S 1 one has and it is seen that √ n(F p (tW √ n(W n − 1) 2 → 0. It follows now from C n 2 = F p (t)(1 − F p (t)) and weak sequential compactness by passing to subsequences, that On the other hand, by a direct computation, so that letting u = x and dv = x p−1 e −x p /p dx one has Emspheric p = 2 Empiric Figure 1: Comparison of covariance functions; empiric is Brownian bridge A plot of a portion of the covariance function close to 0 appears in Figure 1 and a comparison of variances on the same scale in Figure 2.
n (u), 0 ≤ u ≤ 1 (see equations (14) and (15)). Then there is a Gaussian process E(u) = B(u) + Proof : From what has been done so far it follows that for an arbitrary subsequence n ′ of n the measures µ n ′ on D × D which are the joint distributions of (E (1) n , E n ) have a further subsequence n ′′ and there is a probability measureμ on D × D for which µ n ′′ ⇒μ. This measure has marginals (B, φc(1,·) √ p Z) and the covariance of B(u) and Z is given by (17). Sincē µ concentrates on C × C and θ(x, y) = x + y is continuous thereon, one has a probability measureη on D defined for A ∈ D byη(A) =μ(θ −1 A) and the support ofη is contained in C. It will now be argued that this measureη is Gaussian. It is convenient to do this in terms of the original X j 's. Let X 1 , X 2 , . . . , be iid F p , fix −∞ < t 1 < t 2 < t k < ∞, and consider the random vectors W (n) (t) = (W n (t 1 ), . . . W n (t k )), where W n (t) = √ n 1 n n v=1 (I (−∞,t] ( X v ψ p (X n ) ) − F p (t)).
Since W (n ′′ ) d = (E n ′′ (F p (t 1 )), . . . , E n ′′ (F p (t k ))) L → (E(F p (t 1 )), . . . , E(F p (t k ))) = W and since E is continuous wp 1 and ψ p (X n )→1 one has also W (n ′′ ) (t/ψ p (X n ′′ )) d → W. Noting that it is seen that W, being the limit in law of sums of iid well-behaved vectors, is a multivariate normal. Furthermore, the limiting finite dimensional marginals do not depend on the subsequence. Therefore, the measureη is unique and Gaussian and the claim has been proven.

ℓ ∞ (n)
Convergence also holds in the case p = ∞, where one can arrive at the correct statement and conclusion purely formally by taking the limit as p → ∞ in the statement of Theorem 1; so F ∞ is the uniform on [−1, 1], the random vector Y n = (Y 1n , . . . , Y nn ) ∈ S ∞,n (1), and for t ∈ [−1, 1] This follows from: 1. If ψ ∞ (X n ) = max{|X 1 |, . . . , |X n |}, then ψ ∞ (X n ) ∈ [0, 1] and one has for 1 > v > 0, that the term in the limit process additional to the Brownian bridge part (the right-most term in (5)) washes out and one has as limit simply the Brownian bridge B( 1+t 2 I [−1,1] (t)).

Acknowledgment
Leonid Bunimovich introduced me to the question of coordinate distribution in ℓ 2 . Important modern references resulted from some of Christian Houdré's suggested literature on the isoperimetry problem in ℓ p . Thanks also are hereby expressed to the referees and editors of this journal for their careful attention to my paper and valuable comments.