On Hadamard powers of Random Wishart matrices

A famous result of Horn and Fitzgerald is that the $\beta$-th Hadamard power of any $n\times n$ positive semi-definite (p.s.d) matrix with non-negative entries is p.s.d $\forall \beta\geq n-2$ and is not necessarliy p.s.d for $\beta<n-2,$ with $\ \beta\notin \mathbb{N}$. In this article, we study this question for random Wishart matrix $A_n:={X_nX_n^T}$, where $X_n$ is $n\times n$ matrix with i.i.d. Gaussians. It is shown that applying $x\rightarrow |x|^{\alpha}$ entrywise to $A_n$, the resulting matrix is p.s.d, with high probability, for $\alpha>1$ and is not p.s.d, with high probability, for $\alpha<1$. It is also shown that if $X_n$ are $\lfloor n^{s}\rfloor\times n$ matrices, for any $s<1$, the transition of positivity occurs at the exponent $\alpha=s$.


INTRODUCTION
Entrywise exponents of matrices preserving positive semi-definiteness has been a topic of active research.An important theorem in this field is the result of Horn and Fitzgerald [3].Let P + n denote the set of n × n p.s.d.matrices with non-negative entries.Schur product theorem gives us that the m-th Hadamard power A •m := [a m ij ] of any p.s.d.matrix A = [a ij ] ∈ P + n is again p.s.d. for every positive integer m.Horn and Fitzgerald proved that n − 2 is the 'critical exponent' for such matrices, i.e., n − 2 is the least number for which A •α ∈ P + n for every A ∈ P + n and for every real number α ≥ n − 2. They considered the matrix A ∈ P + n with (i, j)-th entry 1 + εij and showed that if α is not an integer and 0 < α < n − 2, then A •α is not positive semi-definite for a sufficiently small positive number ε (see [6]).
We consider a random matrix version of this problem.Let X := [X ij ] be a n × n matrix, where We are interested in the values of real α > 0 for which the matrix B n,α is positive semi-definite, with high probability.Simulations show that for large values of n, if α > 1 then with high probability, B n,α is positive semi-definite and for α < 1, with high probability, B n,α is not positive semi-definite (as shown in Table 1).
We state and prove the theorem that these observations from simulations are indeed true.In fact we prove a stronger result.Fix any s ≤ 1 and let m = n s .Let X n := [X ij ] be a m × n matrix, where X ij are i.i.d standard normal random variables.Define A n,s := XnX T n n and B n,α,s := |A n,s | •α .
Let λ 1 (A) denote the smallest eigenvalue of A. We prove the following main result.
Theorem 1. ∃ε s > 0 such that for α > s, as n → ∞ Remark 2. Simulations show Theorem 1 holds if i.i.d Gaussians are replaced by other i.i.d random variables with finite second moment like Uniform(0, 1), Exp(1) and even heavy tailed distributions like Cauchy distribution, distributions with densities f (x) = bx −1−b , ∀x ≥ 1, all with transition of positivity at exponent α = s.Note that in the last case one does not have finite mean if b is small.This suggests that the transition of matrix positivity happens for a large family of distributions.In this direction we prove the below proposition where we show that B n,α,s is p.s.d for the range of α > 2s, when X n has sub-Gaussian entries.Proposition 3. Let the entries of X n be i.i.d sub-Gaussian random variables with mean 0 and unit variance.Fix α > 2s and ε > 0. Define B n,α,s as before.Then as n → ∞

Remark 4.
Although Theorem 1 and Proposition 3 hold for m = Θ(n s ), for definiteness we fix m = n s .For m = a × n for fixed a > 0, the transition of positivity is at exponent 1.For the critical exponent to be less than 1, we need m = Θ(n s ) with s < 1, which is much smaller, unlike in the study of spectrum of Wishart matrices.
A standard way to study the distribution of eigenvalues of a random matrix is to look at the limit of empirical spectral distributions using method of moments.For example, Wigner's proof of semi-circle law for Gaussian ensemble uses this method (For more see [1]).In our case, the entries of the matrix B n,α,s are sums of products of random variables and the entries on the same row or column are correlated.The entrywise absolute fractional power makes this problem intractable, if we try to use method of moments.As we are interested only in the existence of negative eigenvalues, we manage to avoid computing all the moments.

Outline of the paper:
First we prove Proposition 3 in Section 2. This is done using Gershgorin's circle theorem and the sub-exponential Bernstein's inequality.Note that this proposition is not needed to prove Theorem 1.
The proof of Theorem 1 is divided into two parts.In the first part of the proof, we consider the range α < s.Let C n,α,s := . For ease of notation, we write where α is as defined in Subsection 1.2.We use the following lemma, whose proof is given in Section 3, to conclude that EESD of B n,α,s has positive weight on negative reals.
Lemma 5. Let μEm be the EESD of E m .Then i) Limit of first moment of μEm is 0 ii) Limit of second moment of μEm is a positive constant iii) The fourth moments of μEm are uniformly bounded.
Using a concentration of measure result, we show that with high probability, B n,α,s has negative eigenvalues.This is done in Section 3.
In the second part of the proof, we consider the range s < α.We further divide this range by looking at k+1 k s < α, where k is an integer greater than 1 and let k → ∞.For k+1 k s < α, we consider C m , a modification of B n,α,s , whose EESD has 2k-th moment converging to 0 to conclude that the probability of B n,α,s having a negative eigenvalue converges to 0. We then let k be arbitrarily large.This is done in Section 4.
2) λ 1 (A) and λ m (A) denote the smallest and largest eigenvalues of A respectively.
where Z is a standard normal random variable.6) J n = All ones matrix of size n × n and I n = n × n identity matrix.7) F i,j = The sigma algebra generated from the ith row and jth row of X n .

PROOF OF PROPOSITION 3
In this section we prove Proposition 3.
Proof of Proposition 3.For ease of notation, we write B n,α,s as B n .The diagonal entries of B n are of the form R i ,R i n α and off-diagonal entries are of the form . Note that all the offdiagonal entries are identically distributed and all the diagonal entries are identically distributed.
First we give an upper bound for the probability that m i=2 (B n ) 1i > ε.
Note that (B n ) 12 is a function of sum of n independent sub-exponential random variables (product of independent Gaussians is sub-exponential (Lemma 2.7.7 of [7]).We now recall the Bernstein inequality for sub-exponential random variables from [7].
. ., X N be independent, mean zero, sub-exponential random variables.Then, for every t ≥ 0, we have where c > 0 is an absolute constant and X ψ 1 is the sub-exponential norm of X.
Bernstein's inequality and the fact that m = n s gives us that Using the identical distribution of off-diagonal entries, we get that For the diagonal entry (B n ) 11 , we have for a constant c 2 = c 2 (ε, α).Here we have used Theorem 6 in the last inequality, as R 1 , R 1 −n is a sum of n mean 0, i.i.d, sub-exponential random variables and t = n(( This implies that Similarly Applying Gershgorin circle theorem (Theorem 6.1.1 of [5]) to B n , using (3), ( 4), ( 5), gives us that, with probability at least Here c 3 > 0 depends on ε and α.As α > 2s, this completes the proof of Proposition 3.

α < s RANGE
In this section we prove Theorem 1 for the range α < s.We define a few terms here which will be used in the rest of the article.Empirical spectral distribution of a symmetric random matrix A n is the random probability measure µ An := 1 n n i=1 δ λ i , where λ i s are the eigenvalues of A n .Expected empirical spectral distribution(EESD) of A n is the probability measure μAn such that R f dμ An = E R f dµ An , for all bounded continuous functions f (For more see [1]).We prove the following lemma which implies Theorem 1 for the range α < s.
Proof of Lemma 7. We complete the proof of Lemma 7 assuming Lemma 5 and then provide the proof of Lemma 5.For the sake of contradiction assume that P (λ 1 (C m ) < 0) does not converge to 1, then by going to a subsequence we may assume that ∃ ε > 0 such that P (λ 1 (C m ) > 0) > ε and μEm converge weakly to some probability distribution µ (Using (ii) of Lemma 5 we get the tightness of μEm ).Now µ must have mean 0, positive variance.Indeed, if a sequence of probability distributions μEm converge weakly to µ, then by Skhorokhod's theorem, on some probability space there exist random variables T m ∼ μEm and T ∼ µ such that T m converge almost surely to T .Now as μEm have uniform bound on second moments, we get that T m are uniformly integrable.This implies that the first moment of T is the limit of first moments of T m .Similarly as the fourth moments of T m are uniformly bounded, the second moment of T is the limit of second moments of T m .Thus µ has mean 0, positive variance.
As µ has zero mean and positive variance, µ(−∞, −ω) ≥ η for some η, ω > 0. This gives us that for large enough n.We would like to say with high probability, empirical spectral distributions of E m also have positive weight on the negative reals.This would imply the existence of negative eigenvalues, with high probability.Here we make use of the following McDiarmid-type concentration result due to Guntuboyina and Leeb [4].For a n × n symmetric matrix A, let µ A denote the probability measure µ A := 1 n n i=1 δ λ i , where λ i s are the eigenvalues of A. Let F µ A denote the cumulative distribution function of µ A and The Kolmogorov-Smirnov distance between two probability measures µ, µ is defined as denote the total variation of the function g on an interval [a, b] and for the matrix obtained from M after replacing Y i by an independent copy, i.e., assume that holds (almost surely) for each i = 1, 2, . . ., m and for some (fixed) integer r.Finally, assume that g : R → R is of bounded variation on R. For each ε > 0, we then have .
We apply Theorem 8 where E m is the matrix M which is a function of the n s rows (independent) of X n .In order to apply Theorem 8, we need to show almost surely.Here E m(i) is the matrix obtained when ith row of X n is replaced by an independent and identical copy.To show (7), we use that the rank(E m − E m(i) ) ≤ 2 and the standard rank inequality (Lemma 2.5 of [2]) which gives us Note that V f (R) is finite and independent of n.We can now apply Theorem 8 to the matrices E m .Using the function f = 1 (−∞,−ω) as the bounded variation function and applying Theorem 8, we get for some c > 0. Using ( 6) and (8), we get that, for large enough n E m is almost C m , with diagonals made 0 and then off-diagonals are subtracted by α / n s .
Using (5), it can be seen that Weyl's inequality (Theorem 4.3.1 of [5]) bounds the amount of perturbation of eigenvalues due to perturbation of a matrix.Using Weyl's inequality, along with (10) gives that, As rank(E m + D m − C m ) = 1 and α < s, using rank inequality (Lemma 2.5 of [2]) again, we get that P(all the eigenvalues of C m are non-negative) which contradicts the earlier assumption.This completes the proof of Lemma 7.
We now prove Lemma 5.
Proof of Lemma 5. Computation of moments of μEm : Before we start the computations, we make a note of the form of entries of E m .
Diagonal entries: We prove limits of first and second moments of μEm are 0 and a positive value.

Limit of first moments:
Limit of second moments: As the offdiagonal entries are identically distributed, it is enough to look at the limit of Using central limit theorem, uniform bound on E and m = n s , it is easy to see that the limit is We now prove that the fourth moments of μEm are uniformly bounded.

Uniform bound of fourth moments:
R This is a sum of expectations with each term corresponding to a closed walk of length 4 on the complete graph K m .It is enough to look at closed walks starting and ending at vertex 1.Such walks can visit 2, 3 or 4 different vertices, including the vertex 1.
The four terms in the above equation correspond to four different types of walks as shown below. e Using the fact that off-diagonal entries of E m are identically distributed, uniform bound on and central limit theorem, it can be seen that Using a similar argument as above it can be seen that where Z 1 , Z 2 are i.i.d standard Gaussians.

If we prove that
then using (11), (12), (13), we would have proved that fourth moments of μEm are uniformly bounded and we would be done with the proof of Lemma 7. Note that Let F 1,3 denote the sigma algebra generated from the 1st row and 3rd row of X n and Note that using independence of 2nd row and 4th row of X n , RHS of ( 14) can be written as, We prove the below lemma from which it follows that lim n→∞ mE[Y 2 1,3 ] = 0 and hence the fourth moments of μEm are uniformly bounded.Let where M k > 0 are some constants dependent on k. Proof.
Define a function of correlation coefficient as below, x 2 + y 2 − 2xyρ dxdy.
We now show I(ρ)/ρ 2 is a bounded function.Fix t > 0. For |ρ| > t, note that I(ρ) is Gaussian expectation and therefore I(ρ)/ρ 2 is bounded.We use L'Hospital's rule to get a bound on I(ρ) ρ 2 when |ρ| < t.Using differentiation under integral sign, and using L'Hospital's rule twice, it can be seen that I(ρ)/ρ 2 is a bounded function.Hence we can write, As ∀α < 2, As a result we can write, It is easy to see that, the kth moments of are uniformly bounded by some constant, ∀n ∈ N and hence kth moments of nY are also uniformly bounded .This completes the proof of Lemma 9.
This proves that the fourth moments are uniformly bounded.This completes the proof of Lemma 5.

α > s RANGE
In this section we prove Theorem 1 for the range α > s.We prove the below lemma which implies Theorem 1 for this range of α.
Proof of Lemma 10.For ease of notation, we write B n,α,s as B m .Define a diagonal matrix − α and the diagonal entries of C m are zero.
We first show that P(λ 1 (C m ) ≤ −1 + 2ε) → 0. This will complete the proof of Lemma 10.This is true as, using Lemma 6, we have for some constant c 3 > 0 depending on α.To get the matrix B m , we add C m with D m +( α /n α/2 )J m .Using Weyl's inequality (Theorem 4.3.1 of [5]), we get The above inequality shows that the eigenvalues of B m are at least 1 − ε more than that of C m , with high probability.This completes the proof if we prove P(λ

We prove that E[Tr(C 2k
m )] → 0, where α > k+1 k s.This completes the proof of the theorem.
We state a lemma here which generalizes Lemma 9. Let p ∈ N ≥3 .Define mean the vertices, like "3" and "1", which are of degree 2 as shown below (In the graph generated due to closed walk, such vertices are leafs).So it is enough to look at paths visiting at least k + 2 vertices.
Closed walks of length 2k, visiting k + l, l ≥ 2 vertices, must have at least 2l vertices of degree 2 (none of which are leaf vertices) as shown below.This is due to the fact that since it is a closed walk, degree of every vertex is even and sum of degrees of vertices must equal twice the total number of edges.
There would be C i,j C j,k term when expanding Tr(C 2k m ) as sum of product of entries of C m .This factor shows up due to the vertex j having degree 2. We would like to condition on the rows i, k of X n and use Lemma 9.
It could happen that more than 1, say t, degree-2 vertices come together in series as shown below.In such a case we condition as shown in the example below.

X
ij are i.i.d standard normal random variables.Define A n := XX T n and |A n | •α as the matrix obtained by applying x → |x| α function entrywise to A n .Let B n,α := |A n | •α .

FIGURE 5 .
FIGURE 5.The vertices 1, 3 are leaf vertices Suppose there is a path traversing vertices a through e, as shown above, where degrees of both a, e are at least 4 and b, c, d are all degree-2 vertices.Here degrees are calculated in the graph generated by the closed walk of length 2k.In such a case we will have the factor C a,b C b,c C c,d C d,e in the expansion of Tr(C 2k m ) corresponding to that path.In the expectation term corresponding to such a path, we condition on rows a, c, e and use independence to get 2 conditional expectations Y a,c , Y c,e mentioned in Section 3 .The 'x' mark denotes the rows which we are going to condition on.If there are even number of degree-2 vertices coming together, we condition as shown below.In the case shown above, vertices a, d have degree at least 4 and b, c are degree-2 vertices.We condition of rows a, c, d.All other rows corresponding to vertices with degrees greater than 2 will also be conditioned.conditioning on (Lemma 11 is not needed here).As G is of the order of 1/n and α > k+1 k s,n s g+t+l n (g+t)α E R 1 , R 2 √ n α − α . . .→ 0This shows that E[Tr(C 2k n )] → 0, as n → ∞.Taking k arbitrarily large completes the proof of Lemma 10.

TABLE 1 .
Table of smallest eigenvalues for varying α and s with n = 5000.