The Spectral Norm of Random Lifts of Matrices

We study the spectral norm of matrix random lifts A for a given n × n matrix A and k ≥ 2, which is a random symmetric kn × kn matrix whose k × k blocks are obtained by multiplying Aij by a k×k matrix drawn independently from a distribution π supported on k×k matrices with spectral norm at most 1. Assuming that EπX = 0, we prove that E‖A‖ . max i √ ∑ j Aij + max ij |Aij | √ log(kn). This result can be viewed as an extension of existing spectral bounds on random matrices with independent entries, providing further instances where the multiplicative √ logn factor in the Non-Commutative Khintchine inequality can be removed. We also show an application on random k-lifts of graphs (each vertex of the graph is replaced with k vertices, and each edge is replaced with a random bipartite matching between the two sets of k vertices each). We prove an upper bound of O( √ ∆ + √ log(kn)) on the new eigenvalues for random k-lifts of a fixed G = (V,E) with |V | = n and maximum degree ∆, which improves the previous result of O( √ ∆ log(kn)) by Oliveira in [Oli09]. Email: bandeira@math.ethz.ch. Part of this work was done while ASB was with the Department of Mathematics at the Courant Institute of Mathematical Sciences, and the Center for Data Science, at New York University; and partially supported by NSF grants DMS-1712730 and DMS-1719545, and by a grant from the Sloan Foundation. Email: yding@nyu.edu. Partially supported by NSF grant DMS-1712730.


The Non-Commutative Khintchine inequality
The Non-Commutative Khintchine (NCK) inequality, originally introduced by Lust-Piquard and Pisier [Pis03], is one of the simplest tools for understanding the spectrum of matrix series, namely where A i (i = 1, 2, . . . , N ) are n × n real symmetric matrices and γ i (i = 1, 2, . . . , N ) are i.i.d. random variables, usually assumed gaussian or Rademacher. The inequality is stated as follows.
Theorem 1.1 (Non-Commutative Khintchine (NCK) inequality). Let A 1 , A 2 , . . . , A N be n × n symmetric matrices and γ 1 , γ 2 , . . . , γ N be i.i.d. N (0, 1) random variables, then The NCK inequality and other phenomena of matrix concentration have been proven under various settings and extensively studied in [Oli10b,Tro12,Tro15]. One particularly important application of matrix concentration is on the spectra of random matrices with independent entries. These random matrices can be represented as matrix series upon a direct entry-wise decomposition, as we show below.

Random matrices with independent entries
The study of random matrices with independent entries traces back to the seminal work by Wigner [Wig58]. For Wigner matrices (real symmetric or Hermitian random matrices with independent mean-zero and unit variance entries), a long line of work has established a comprehensive understanding towards its spectral properties over the past decades (see, for example, [FK81,BY88,AGZ10,Tao12]). One of the most important results is the Wigner semicircle law: for n × n Wigner matrix X, E X / √ n → 2 and the spectrum of X converges to the semicircle 1 Random matrices with different variances on each of the independent entries, for RV10,Ver10]. With the NCK inequality, the following estimate can be obtained: Here A B (respectively, A B) refer to A ≤ CB (respectively, A ≥ CB) for some absolute positive constant C. The definition of σ is consistent with (1.2) upon writing X as the following matrix series: (1.4) Here E ij := e i e j . One may immediately notice that the bound (1.3) is not sharp for Wigner matrices with i.i.d. standard gaussian entries, since it gives E X √ n log n rather than E X ∼ √ n. In fact, a recent improvement for matrices with independent This upper bound is sharp as a matching lower bound E X σ + σ * √ log n is also given in [BvH16] under mild assumptions on the b ij 's. Further refinements, that hold for general b ij 's, have been recently obtained [vH17, LvHY18, BGBK + 20].

Improving the NCK inequality
The gap between the NCK bound (1.3) and its improvement (1.5) demonstrates the sub-optimality of the NCK inequality in some settings. In fact, many improvements to bounds obtained via the NCK inequality are known under various settings. Seginer [Seg00] improved the NCK bound for random matrices with independent uniformly bounded entries. Exploiting the non-commutativity among the A i 's, Tropp [Tro18] proved the following upper bound for the series (1.1), which improved the multiplicative factor on σ from √ log n to 4 √ log n: E X σ 4 log n + ω log n where the alignment parameter ω is defined as , here U n denotes the group of n × n unitary matrices (the paper [Tro18] considers Hermitian A i 's, while in this paper we focus on the real symmetric setting). In fact, a bound which replaces the multiplicative √ log n factor in the NCK inequality by an additive factor has been hypothesized in many different forms [Tro12,Ban15,BvH16,vH17,LvHY18,Tro18]. For the matrix series (1.1), define the "weak variance" as (1.7) Note that σ * ≤ σ by a simple application of the Cauchy-Schwarz inequality. For random matrices with independent X ij ∼ N (0, b 2 ij ) for 1 ≤ i ≤ j ≤ n, upon writing it as a matrix series as in (1.4), we have It is not hard to show that thus σ * defined in (1.6) is consistent with the quantity defined in (1.5) in that they differ only by a multiplicative constant, and the proposed improvement (1.7) is indeed true in the case of random matrices with independent entries due to [BvH16].
In this paper, we show another class of examples, in which we improve the bound given by the NCK inequality (1.2) by replacing the multiplicative log factor with an additive factor, as is the case in the conjectured bound (1.7). In the following context, all the matrices we consider are real. As an extension to random matrices with independent entries, we consider the operation of matrix lifts, in which each entry of an underlying deterministic matrix is replaced by the product of itself and a random k × k matrix, as described in the following definition.
Definition 1.2 (Matrix lifts). Let A be an n × n symmetric matrix with zero diagonal entries, and π be a measure supported on k × k matrices. Define the (k, π) lift of A, denoted A (k,π) , as follows: The resulting matrix is a kn × kn symmetric random matrix, which can be written as where the symbol "⊗" on the RHS denotes the Kronecker product of matrices.
The main theorem of this paper is the following bound: Theorem 1.3. Let A be a symmetric n × n matrix (n ≥ 2) with zero diagonal entries. Suppose π is a centered measure supported on k × k matrices with spectral norm at most 1. Then there exists a universal constant C, such that for any ∈ (0, 1/2], Note that Definition 1.2 and Theorem 1.3 only apply to base matrices A with zero diagonal entries. A base matrix with possibly non-zero diagonal entries can be handled by splitting it in its diagonal and non-diagonal parts and using triangular inequality in the result random matrices. Remark 1.4. Upon taking A = {b ij }, k = 1 and π = Uniform{±1}, Definition 1.2 and Theorem 1.3 include as a special case the real symmetric random matrix X ∈ R n×n with X ij = ij b ij , where {b ij } are given and ij are independent Rademacher random variables for 1 ≤ i ≤ j ≤ n, that is, P [ ij = ±1] = 1/2. Since [BvH16] showed that the bound O(σ + σ * √ log n) captures the optimal scaling of E X with respect to σ and σ * √ log n and is in general unimprovable, this implies the same for our bound (1.9) on E A (k,π) . However, Theorem 1.3 does not directly imply the bound (1.5), since gaussian random variables are not compactly supported.
Besides the k = 1 case, Theorem 1.3 is also interesting with natural choices such as π being the Haar measure on the orthogonal group O(k) or special orthogonal group SO(k). One particular application is an estimate on the spectrum of random lifts of graphs, which we discuss below.

Application: random lifts of graphs
Given an undirected graph G = (V, E) and an integer k ≥ 2, the random k-lift of G, denoted G (k) , is obtained by replacing each vertex v ∈ V by k new vertices, and each edge e = (v 1 , v 2 ) by a random k × k bipartite matching between the k new vertices corresponding to v 1 and those corresponding to v 2 . Here "random" refers to a uniform choice on all k! possible bipartite matchings. We denote A and A (k) the adjacency matrix of G and G (k) , respectively.
Previous studies on the k-lifts of graphs, under the setting of fixed G and k → ∞, have revealed many properties of the resulting random graph, such as connectivity [ We should notice that the above line of work adopted the asymptotic regime k → ∞ and the setting that the base graph is taken randomly over all d-regular graphs on n vertices. In fact, in the case that the base graph is a fixed d-regular graph and k ∈ N + is fixed, (1.10) is not always upper bounded by O( √ d). As a counterexample (see [BvH16], Remark 4.8): consider G the union of n/s cliques of s vertices each, with no edges between different cliques; here s = √ log n , and we assume that n/s is an integer for simplicity. Seginer [Seg00] showed that whereas the O( √ d) bound would incorrectly predict that LHS is O(log 1/4 (n)).
Another line of work considers a fixed base graph G with maximum degree ∆ without assuming its randomness. Making use of matrix concentration, Oliveira [Oli10a] obtained a high probability upper bound of O( ∆ log(kn)) on (1.10). The most recent advancement by Bordenave and Collins [BC19] considered the k-lifts problem under a much more general framework, and proved that (1.10) is 2 √ d − 1 + o(1) for any d-regular base graph G as k → ∞, finally settling Friedman's conjecture even without assuming the randomness of the base graph.
In what follows, we manage to improve the bound in [Oli10a] by removing the multiplicative factor log(kn), replacing it with an additive factor. We also improve the constant factor before √ ∆ down to 2, in consistence with Friedman's theorem and [BC19]. In the large k limit, our bound is weaker than [BC19] by an additive log(kn) factor. For k = 2, an additive √ log n factor is needed as illustrated by a counterexample due to Seginer [Seg00]. However, our result does not capture the correct dependence on k, namely concentration arising from large k. Note that we are only using a slight modification of the moment method, compared to the sophisticated combinatorial technique in [BC19].
Theorem 1.5. Let A be the adjacency matrix of G = (V, E) with |V | = n and maxdeg(G) = ∆, and A (k) be the corresponding random k-lift. Then there exists a universal constant C, such that for any ∈ (0, 1/2] (1.11) Our bound is essentially 2(1 + ) √ ∆ as long as ∆ log(kn), i.e. the base graph G is not too sparse. The proof of Theorem 1.5 will follow from our main result, Theorem 1.3.

Notation
In this paper, for positive quantities A and B, A B and A B respectively refer to A ≤ CB and A ≥ CB for some absolute positive constant C. For x ∈ R, x denotes the minimum integer that is larger than or equal to x.

Proof of main results
In this section, we carry out the proof of Theorems 1.3 and 1.5. We begin with the following comparison argument which links A (k,π) to an auxiliary Wigner matrix. This argument is a modification of Proposition 2.1 in [BvH16], and the auxiliary matrix Y r is same as in the proof of Theorem 4.8 in [LvHY18].
Proposition 2.1. Let Y r be the r × r symmetric matrix with zero diagonal and independently for all 1 ≤ i < j ≤ r. Under the setting of Theorem 1.3, suppose σ * ≤ 1, then for every p ∈ N + there holds To carry out the proof of Proposition 2.1, we start with a set of standard notations adopted from [FK81] and [BvH16]. Following the representation (1.8), a direct expansion of (A (k,π) ) 2p yields (2.1) Let G n = ([n], E n ) be the complete graph on n points. A cycle u 1 → u 2 → · · · → u 2p → u 1 of length 2p, where u i ∈ [n] for all 1 ≤ i ≤ 2p (u 2p+1 := u 1 ), is identified as u = (u 1 , . . . , u 2p ) ∈ [n] 2p . Since E [Π ij ] = 0 for any 1 ≤ i, j ≤ n, in the sum of (2.1) we only need to consider cycles with each edge appearing at least twice.
The following set is a collection of all cycles of shapes that contribute to the sum in (2.1): S 2p := {s(u) : u is a cycle of length 2p with each edge appearing at least twice}.
For the sake of convenience, we also define the set of cycles with fixed shape and starting point as Γ s,u := {u ∈ [n] 2p : s(u) = s, u 1 = u}.
The span of a shape s, denoted by m(s), is the largest index in its representation, also the number of distinct vertices any cycle of shape s visits. A direct observation is m(s) ≤ p + 1 for any s ∈ S 2p .
Proof of Proposition 2.1. Following the expansion (2.1) we have where the first inequality follows from 2p j=1 Π uj uj+1 ≤ 1 and therefore Tr 2p j=1 Π uj uj+1 ≤ k; and the second inequality owes to the fact that, under σ * ≤ 1, for any u ∈ [n] and s ∈ S 2p , Lemma 2.5 in [BvH16] gives Meanwhile, for any positive integer r > p, for the auxiliary random matrix Y r we have The last inequality follows from the observation that E[Y m ij ] ≥ 1 for all m ≥ 2. Now choosing r = σ 2 + p, noting that m(s) ≤ p + 1 for all s ∈ S 2p , we have σ 2(m(s)−1) .
we only need to show that there exists an absolute constant C, such that for p ≥ 2, (2.5) The proof of (2.5) is contained in the proof of Theorem 4.8 in [LvHY18], so we do not repeat it here. The main steps of the proof are a norm bound for Wigner matrices with non-symmetrically distributed entries followed by Talagrand's concentration inequality.
The spectral bound of random k-lifts of graphs follows as an immediate corollary.
Proof of Theorem 1.5. Denote Perm(k) the collection of all k × k permutation matrices, and G k := {Π − 1 k J k : Π ∈ Perm(k)} where J k is the k × k matrix with all entries 1. It is easy to verify that X ≤ 1 for any X ∈ G k . Moreover, the adjacency matrix A has σ 2 ≤ ∆ and σ * ≤ 1 by definition. Thus it follows from Theorem 1.3 that E A (k) − EA (k) = E A (k,Unif(G k )) ≤ 2(1 + ) √ ∆ + C log(1 + ) log(kn).
Remark 2.3. In the above proof, we applied Theorem 1.3 on π = Unif(G k ), where G k is the centered version of Perm(k). One may expect that, under the setting of Theorem 1.3 without assuming π is centered, there still holds E A (k,π) − EA (k,π) ≤ C σ + σ * log(kn) . (2.6) Though we do not have a counterexample for (2.6), we must point out that (2.6) only follows from Theorem 1.3 when X − E π X ≤ 1 for every X ∈ supp(π).
We note that the proof of Theorem 1.3 is not exploiting any potential structure of the "lifting matrices" Π ij . In fact, this may explain why Theorem 1.5 is worse than the result in [BC19] by an additive log(kn) factor in the large k limit for d-regular base graphs. One may be able to obtain a stronger result, for instance E A (k,π) − EA (k,π) ≤ 2 √ ∆ + o k (1), with a more careful analysis considering that Π ij are permutation matrices.