On Euclidean random matrices in high dimension

In this note, we study the n x n random Euclidean matrix whose entry (i,j) is equal to f (|| Xi - Xj ||) for some function f and the Xi's are i.i.d. isotropic vectors in Rp. In the regime where n and p both grow to infinity and are proportional, we give some sufficient conditions for the empirical distribution of the eigenvalues to converge weakly. We illustrate our result on log-concave random vectors.


Introduction
Let Y be an isotropic random vector in R p , i.e.EY = 0, E[Y Y T ] = I/p, where I is the identity matrix.Let (X 1 , • • • , X n ) be independent copies of Y .We define the n × n matrix A by, for all 1 i, j n, where f : [0, ∞) → R is a measurable function and • denotes the Euclidean norm.The matrix A is a random Euclidean matrix.It has already attracted some attention see e.g.Mézard, Parisi and Zhee [16], Vershik [18] or Bordenave [7] and references therein.
If B is a symmetric matrix of size n, then its eigenvalues, say λ 1 (B), • • • , λ n (B) are real.The empirical spectral distribution (ESD) of B is classically defined as where δ x is the Dirac delta function at x.In this note, we are interested in the asymptotic convergence of µ A as p and n converge to +∞.This regime has notably been previously considered in El Karoui [10] and Do and Vu [9].More precisely, we fix a sequence p(n) such that Throughout this note, we consider, on a common probability space, an array of random variables n) .For each n, we define the Euclidean matrix A(n) associated.For ease of notation, we will often remove the explicit dependence in n: The Marcenko-Pastur probability distribution with parameter 1/y is given by and dx denotes the Lebesgue measure.Since the celebrated paper of Marcenko and Pastur [15], this distribution is known to be closely related to empirical covariance matrices in high-dimension.
We say that Y has a log-concave distribution, if Y has a density on R p which is log-concave.Log-concave random vectors have an increasing importance in convex geometry, probability and statistics (see e.g.Barthe [5]).We will prove the following result.
Theorem 1.If Y has a log-concave distribution and f is three times differentiable at 2, then, almost surely, as n → ∞, µ A converges weakly to µ, the law of f (0 where S has distribution ν M P . With the weaker assumption that f is differentiable at 2, Theorem 1 is conjectured in Do and Vu [9].Their conjecture has motivated this note.It would follow from the thin-shell hypothesis which asserts that there exists c > 0, such that for any isotropic log-concave vector Y in R p , E( Y −1) 2 c/p (see Anttila, Ball and Perissinaki [3] and Bobkov and Koldobsky [6]).Klartag [14] has proved the thin-shell hypothesis for isotropic unconditional log-concave vectors.
The proof of Theorem 1 will rely on two recent results on log-concave vectors.Let X = X(n) be the n × n matrix with columns given by (X 1 (n), • • • , X n (n)).Pajor and Pastur have proved the following : Theorem 2 ( [17]).If Y has a log-concave distribution, then, in probability, as n → ∞, µ X T X converges weakly to ν M P .
We will also rely on a theorem due to Guédon and Millman.

Theorem 3 ([12]
).There exist positive constants c 0 , c 1 such that if Y is an isotropic log-concave vector in R p , for any t 0, With Theorems 2 and 3 in hand, the heuristic behind Theorem 1 is simple.Theorem 3 implies that X i 2 ≃ 1 with high probability.Hence, since In other words, the matrix A is close to the matrix where I is the identity matrix and J is the matrix with all entries equal to 1. From Theorem 2, µ X T X converges weakly to ν M P .Moreover, since J has rank one, it is negligible for the weak convergence of ESD.It follows that µ M is close to µ.The actual proof of Theorem 1 will be elementary and it will follow this heuristic.We shall use some standard perturbation inequalities for the eigenvalues.The idea to perform a Taylor expansion was already central in [10,9].
Beyond Theorems 2-3, the proof of Theorem 1 is not related to log-concave vectors.In fact, it is nearly always possible to linearize f as soon as the norms of the vectors concentrate around their mean.More precisely, let us say that two sequences of probability measures (µ n ), (ν n ), are asymptotically weakly equal, if for any bounded continuous function f , f dµ n − f dν n converges to 0. Theorem 4. Assume that there exists an integer ℓ 1 such that E| Y − 1| 2ℓ = O(p −1 ), and that for any ε > 0, Then, if f is ℓ times differentiable at 2, almost surely, µ A is asymptotically weakly equal to the law of f (0 , where S has distribution Eµ X T X .
The case ℓ = 1 of the above statement is contained in Do and Vu [9, Theorem 5].Besides Theorem 2, some general conditions on the matrix X guarantee the convergence of µ X T X , see Yin and Krishnaiah [19], Götze and Tikhomirov [11] or Adamczak [1].

Proofs
2.1.Perturbation inequalities.We first recall some basic perturbation inequalities of eigenvalues and introduce a good notion of distances for ESD.For µ, ν two real probability measures, the Kolmogorov-Smirnov distance can be defined as where, for f : R → R, the bounded variation norm is For p 1, let µ, ν be two real probability measures such that |x| p dµ and |x| p dν are finite.We define the L p -Wasserstein distance as where the infimum is over all coupling π of µ and ν (i.e.π is probability measure on R × R whose first marginal is equal to µ and second marginal is equal to ν).Hölder inequality implies that for 1 p q, W p W q .Moreover, the Kantorovich-Rubinstein duality gives a variational expression for W 1 : We finally introduce the distance By Lemmas 5 and 6, we obtain that for any n × n Hermitian matrices B, C, Notice that d(µ n , µ) → 0 implies that µ n converges weakly to µ.

Concentration inequality. For
R), define a(x) as the Euclidean matrix obtained from the columns of x : a(x) ij = f ( x i −x j 2 ).In particular, we have A = a(X).
R) and assume that x ′ j = x j for all j = i.Then a(x) and a(x ′ ) have all entries equal but the entries on the i-th row or column.We get rank(a(x) − a(x ′ )) 2.
It thus follows from Lemma 5 that for any function f with f BV < ∞, Using Azuma-Hoeffding's inequality, it is then straightforward to check that for any t 0, (For a proof, see [8, proof of Lemma C.2] or Guntuboyina and Leeb [13]).Using the Borel-Cantelli Lemma, this shows that for any such function f , a.s.
Now, recall that M was defined by (2).Since the matrix J has rank one, from Theorem 2 and Lemma 5, Eµ M converges weakly to µ. Hence our Theorem 1 is a corollary of the following proposition.
Proposition 7.Under the assumptions of Theorem 1, we have

2.3.
Proof of Proposition 7. The idea is to perform a multiple Taylor expansion which takes the best out of (4).
Step 1 : concentration of norms.By assumption, there exists an open interval For any i = j, (X i − X j )/ √ 2 is an isotropic log-concave vector.Define the sequence ε(n) = n −κ ∧ (δ/2) with 0 < κ < 1/6.It follows from Theorem 3 and the union bound that the event has probability tending to 1 as n goes to infinity.
Step 2 : Taylor expansion around X i 2 + X j 2 .We consider the matrix where δ(n) is a sequence going to 0. From (4) and Jensen's inequality, we get . Now, from the assumption that X 1 and X 2 are independent and isotropic, we find By assumption (1), we deduce that It thus remains to compare Eµ B and Eµ M .
Step 3 : Taylor expansion around 2. We define the matrix We now use the fact that f ′ is locally Lipschitz at 2. It follows that if E holds, for i = j, Step 4 : Taylor expansion around 2 again.We now consider the matrix We are going to prove that lim We perform a Taylor expansion of order 3 of f ( X i 2 + X j 2 ) around 2. It follows that if E holds, for i = j, where δ(n) is a sequence going to 0. Using (4) and arguing as in step 2, in order to prove (6), it thus suffices to show that 1 To this end, for integer ℓ 1, we write Then, Theorem 3 implies that there exists c ℓ such that This proves (6).It finally remains to compare Eµ D and Eµ M .
Step 5 : End of proof.We set We note that for i = j, for some coefficients c kℓ depending on f ′ (2), f ′′ (2), f ′′′ (2).Note that c 10 = c 01 = f ′ (2).Similarly, Define the matrix E, for all 1 i, j n, If E holds, then max i |z i | ε(n) and we find It follows from (4) that We deduce that lim n→∞ d(Eµ D , Eµ E ) = 0.
We notice finally that the matrix E − M is equal to where Z k is the vector with coordinates (z k i ) 1 i n .It implies in particular that rank(E −M ) 9, indeed the rank is subadditive and rank(Z k Z ℓ T ) 1.In particular, it follows from (4) that This concludes the proof of Proposition 7 and of Theorem 1.

2.4.
Proof of Theorem 4. The concentration inequality (5) holds.It is thus sufficient to prove the analog of Proposition 7. If ℓ 2, the proof is essentially unchanged.In step 1, the assumption (3) implies the existence of a sequence ε = ε(n) going to 0 such that P(E) → 1.
Then, in step 4, it suffices to extend the Taylor expansion up to ℓ.
For the case ℓ = 1 : in step 2, we perform directly the Taylor expansion around 2, for i = j we write f ( X i − X j 2 ) = f (2) − 2f ′ (2)X T i X j (1 + o(1)).We then move directly to step 5. (As already pointed, this case is treated in [9]).
The argument of step 2 implies that lim n→∞ d(Eµ B , Eµ C ) = 0.It thus remains to compare Eµ C and Eµ M .
[4, the supremum is over all real increasing sequence (x k ) k∈Z .The following inequality is a classical consequence of the interlacing of eigenvalues (see e.g.Bai and Silverstein[4, Theorem A.43]).