Norms of Randomized Circulant Matrices

We investigate two-sided bounds for operator norms of random matrices with unhomogenous independent entries. We formulate a lower bound for Rademacher matrices and conjecture that it may be reversed up to a universal constant. We show that our conjecture holds up to $\log\log n$ factor for randomized $n\times n$ circulant matrices and double logarithm may be eliminated under some mild additional assumptions on the coefficients.


Introduction and main results
Study of random matrices is one of the central issues of probability theory and its applications. Classical random matrix theory, initially motivated by mathematical physics, is mostly concerned with the investigation of homogenous matrix ensembles, possesing a large degree of symmetry [1,12]. In many applications one needs however to consider highly nonhomogenous random matrices. In such situations one cannot expect as precise results as for the classical ensembles, nevertheless in recent years there was made a significant progress in this area and many important estimates were derived, cf. [2,13,14] and references therein.
The aim of this paper is to discuss bounds for the operator norm of non-homogenous random matrices X = (X ij ) i,j≤n with independent entries. It is easy to reduce to the case of mean zero random matrices, i.e. when X ij are independent centered r.v's. The Gaussian case was solved in [8], where it was shown that if X ij ∼ N (0, σ 2 ij ) are independent Gaussian r.v's, Here and throughout the paper, · denotes the operator norm, unless indicated otherwise. The last estimate above differs slightly from the one formulated in [8], but it is not hard to see that it is equivalent to it (see the proof of the second bound in Proposition 4.4 below). The most interesting case left are Rademacher matrices, i.e. random matrices with coefficients X ij = a ij ε ij , where ε ij , 1 ≤ i, j ≤ n are independent symmetric ±1 r.v's. The main body of the paper consists of results proved in this setting. A lot of things may be however done in a bigger generality, as we show in Section 4.
Our first result is the lower bound for the operator norm. For two nonnegative functions f and g we write f g (or g f ), if there exists an absolute constant C such that Cf ≥ g. Notation f ∼ g means that f g and g f . We use C and c to denote universal constants and their values might be different at each appearance. We write S p = (E|S| p ) 1/p for the L p -norm of a random variable S. The same notation is used for the ℓ p -norm of a vector: Theorem 1.1. Let (a ij ) i,j≤n be any real matrix and X ij = a ij ε ij . Then . (1) Remark. Since N (0, σ 2 ) p ∼ √ pσ for p ≥ ln 2, in the Gaussian case we have where g ij are i.i.d. N (0, 1) r.v's. Thus the main result of [8] states that (1) may be reversed if X ij are independent centered Gaussian r.v's. Theorem 1.1 and the remark above motivate the following conjecture. .
Our main result states that if coefficients (a ij ) form a circulant matrix then Conjecture 1.2 is satisfied up to log log n factor. .
Moreover, log log factors may be eliminated in the case when b i takes only values 0 and 1 (i.e. when (a ij ) is an adjacency matrix of a directed circulant graph).
In order to apply such a result it would be nice to have a simple two-sided bound for the quantity A ε,p := sup Two-sided bounds for L p -norms of Rademacher sums were derived in [5] on the base of tail bounds [10] (see also [6] for a discussion of various equivalent norms): where (a * k ) n k=1 denotes the nonincreasing rearrangement of (|a k |) n k=1 . It is however nonobvious how to apply the above bounds to get a simple two-sided estimate for A ε,p . We were able to derive such bound when A is an adjacency matrix of a (directed) graph.
In the general case, a similar result holds. However, here the upper and lower bounds are not of the same order. Proposition 1.5. For any matrix A = (a ij ) i,j≤n and p ≥ 1, We do not know whether the logarithmic factor is necessary.
Organization of the paper. In Section 2 we prove the main results of the paper, i.e. Theorems 1.1 and 1.3. In Section 3 we study norms A ε,p and establish Propositions 1.4 and 1.5. We also provide there estimates for A ε,p in the case when A are adjacency matrices of hypercubes Z d 2 and more general discrete tori Z d m . The adjacency matrix of Z d m is not circulant, however it is very close to have such a property and Corollary 3.5 shows that we may apply previously derived estimates to get two-sided bounds on E (½ i∼j ε ij ) i,j∈Z d m .
In the last section of the paper we extend Theorem 1.1 and Conjecture 1.2 to the case of random matrices with independent entries X ij satisfying the mild regularity condition X ij 2p ≤ α X ij p for p ≥ 2. We show that results of [8] imply that more general Conjecture 4.3 holds in the case when X ij are mixtures of Gaussian variables. We also show that to establish formulated conjectures it is enough to prove a slightly weaker bound (28).

Proofs of main results
In the sequel we will frequently compare L p -norms of real and vector-valued Rademacher sums S = n k=1 x k ε k , where x i are vectors from a normed space (F, ) and ε i are independent symmetric ±1 r.v's. The classical Khintchine (in the case when coefficients x i are real) and the Kahane-Khintchine inequalities, cf. [9,Section 4.3], state that for p > q > 0, where S q = (E S q ) 1/q and C p,q is a constant depending only on p and q. Moreover for p > q > 1, C p,q ≤ p−1 q−1 , therefore for p ≥ 2, C p,2 ≤ √ p and for p ≥ 1 + (e − 2) −1 , Markov's inequality yields P( S ≥ e S p ) ≤ e −p . Using the Paley-Zygmund inequality one may derive a reverse lower tail bound for S (similar estimates can be found in [3,7], we present details for the sake of completeness). For any p ≥ 1 + (e − 2) −1 , by taking Z = S p we obtain 2.1 Proof of Theorem 1.1 We start with a simple observation.
The Khintchine inequality yields In a similar way we show that is trivial. Moreover, by the Khintchine-Kahane inequality, E (a ij ε ij ) ∼ (E (a ij ε ij ) 2 ) 1/2 .
To establish the last term in the lower bound let us fix 1 ≤ k ≤ n. We need to show that γ := min Observe that by the Khintchine inequality γ ≤ C log(k + 1) sup so we may consider only k large, in particular we may assume that 2 log(k + 1) ≤ √ k and the estimate follows from the trivial bound (6). In the opposite case for any |I| ≤ k by Lemma 2.1 (applied to i,j / ∈I instead of i,j ) there exists a set J of cardinality at most 2 log(k + 1) ≤ √ k disjoint from I and unit vectors t, s such that i,j∈J Thus we may inductively construct disjoint sets I l and unit vectors s (l) , t (l) , 1 ≤ l ≤ √ k such that Let p = 1 2 log(k + 1). Then p ≥ 1 + (e − 2) −1 and by the Khintchine inequality S 2p ≤ √ e S p . Thus the lower tail bound for Rademacher sums (5) yields Hence we have where the consecutive steps follow by Chebyshev's inequality, independence of the r.v's S l , the tail bound (7) and the inequality 1 − y ≤ e −y .
2.2 Proof of Theorem 1.3 Although G is an undirected graph, we treat it as a directed graph for notational simplicity. It is a regular graph of degree 2d or 2d − 1, when p d = n/2 or p d = n/2 respectively. Therefore it has either |E| = 2dn or (2d − 1)n (directed) edges. The matrix (½ E (i, j)) i,j≤n is the adjacency matrix of G. For simplicity of notation we will denote it by ½ E . If I ⊂ V , then we will also write just I for a subgraph (I, ( For fixed k ∈ [n] we introduce the following two subsets of [n]: Note that the cardinalities of U k and D k do not depend on k and they are both equal m ≤ 2 d . Observe also that for any l ∈ D k , there are at least d distinct elements of D k connected to l with an edge. There is a significant similarity between U k , D k (as subgraphs of G) and the hypercube Z d 2 . If i∈I p i = j∈J p j mod n whenever I, J ⊆ [n] and I = J, then maps ½ I → k ± i∈I p i mod n are isomorphisms between Z d 2 and U k or D k respectively. Otherwise, the maps are not injective and two vertices of Z d 2 may be pasted into one vertex of U k or D k , which inherits neighbours of both. Nevertheless, the degree of a vertex in U k or D k never exceeds its degree in G, which is at most 2d. Due to this structural similarity, we will refer to U k and D k as 'the upper cube' and 'the lower cube' at k respectively.
For a fixed sequence k 1 , . . . , k s ∈ [n] we define the modified, disjoint version of the lower cubes: We are going to show that k 1 , . . . , k s can be chosen in such way that |I l | |D k l | = |I l | m ≥ 7 8 for any l, while at least 1 32 of the edges from E connects vertices belonging to some I l , 1 ≤ l ≤ s.
Proof. Recall that any i ∈ [n] belongs to exactly m of the sets D k , k ∈ [n]. Therefore Proof. Let s := ⌈ n 8m ⌉. We construct inductively k 1 , . . . , k s . For k 1 we choose arbitrary element of [n]. Assume that k 1 , . . . , k r are chosen and r < n 8m . Then the set J = l≤r D k l has at most rm < 1 8 n elements. Therefore, by Recalling that |D k l | = m and each vertex has degree at most 2d, we have Since I l , 1 ≤ l ≤ s are disjoint and s > n 8m , we obtain the result. It follows that the graph G contains a subgraph G ′ = (V, E ′ ) which consists of s mutually unconnected parts (I l , (I l × I l ) ∩ E), l = 1, . . . , s having at most 2 d vertices each, and it contains at least 3% of the edges from E. In other words, the incidence matrix I ′ of the graph G ′ is a block diagonal matrix (possibly after permutation of rows and columns), which cuts out at least 3% of ones from ½ E . We are going to further improve this result.
In what follows we write (a ij ) Lemma 2.4. There are block diagonal matrices B 1 , . . . , B N , N < ∞, with blocks of size at most 2 d , such that B k is an incidence matrix of a graph (V, E k ), E k ⊂ E, k = 1, . . . , N and Proof. We set N = n and as B k , k = 1, . . . , n we take the adjacency matrix of the subgraph l≤s (I l , (I l × I l ) ∩ E) ⊂ G, with coordinates shifted cyclically k times. In the whole proof, if we write '+', we mean addition mod n. In particular, here for X ⊂ N and y ∈ Z we write X + y = {x + y mod n : x ∈ X}.
We start with deriving the following estimate To this aim observe first that D l + k = D l+k . Hence we have In a similar way we show that A ≥ m and (8) easily follows.
Now fix l ∈ [n] and I ⊂ D l such that |D l \ I| ≤ cm. Then hence (8) and (9) imply that Let us take k 1 , . . . , k s as in Lemma 2.3 and set Then, since |D k l \ I l | ≤ 1 8 m and s ≥ n 8m , estimate (10) implies that In the next lemma we prove a useful norm estimate for block diagonal matrices with blocks of fixed size. We are going to use the Hadamard product: By ε = (ε ij ) we denote a matrix with independent symmetric ±1 entries (its size may change from line to line). Recall definition (3) of the norm A ε,p .
Proof. The proof of (i) is an easy exercise. For (ii) we set p = log(n + 1) and observe that A well known result for Bernoulli processes (cf. [3] or more general bound [7, Theorem and the second part of the assertion easily follows.
Estimation of the operator norm E ε · A l is basically as difficult as Theorem 1.3 itself. A comparison between Bernoulli and Gaussian processes gives an upper bound of the form where g i are independent standard Gaussian random variables. This bound is far from optimal in general. However, under some specific assumptions on the matrix A l , it becomes a sufficient tool for our considerations. Recall that two-sided bound for E (a ij g ij ) was obtained in [8]. In particular, it holds that if (a ij ) is an n × n matrix and |a ij | ≤ 1, then Corollary 2.6. Let A = (a ij ) i,j≤n be a matrix of size n × n and d ≤ n/2 a natural number. Assume that (i) There are natural numbers Proof. Assume that B k = (b ij (k)), k = 1, . . . , N are the block diagonal matrices given by Lemma 2.4. Then we have where the first inequality follows by Lemma 2.4, since, by the contraction principle [9, Each of the matrices ε · A · B k is block diagonal, with blocks of size at most 2 d . Let I 1 (k), . . . , I m (k) ⊂ [n] be such that the diagonal blocks are of the form (a ij b ij (k)ε ij ) i,j∈I l (k) . The blocks have coefficients of absolute value at most 1 and in any row at most 2d of them are nonzero. Hence, using (11) and (12) Therefore, by Lemma 2.5, Since B k has 0-1 entries, it holds that A · B k ε,log(n+1) ≤ A ε,log(n+1) and we finish the proof.
Remark. If we fix δ ∈ (0, 1), then the upper bound in Theorem 1.3, without log log terms and with a constant depending on δ, follows from Corollary 2.6 under the assumption that either a ij = 0 or δ ≤ |a ij | ≤ 1 for any i, j.
Corollary 2.6 not only proves Theorem 1.3 in this special 0-1 (or a bit wider) case. In general situation, the proof relies on dividing the matrix (a ij ) into parts, where all nonzero coordinates differ at most by a constant factor. Then again Corollary 2.6 provides the crucial estimate.
Proof of Theorem 1.3. The lower bound can be deduced from Theorem 1.1. To this aim it is sufficient to show that since the term on the right hand side is trivially bounded by the second term on the right hand side of (1). We denote bys,t unit vectors realizing the supremum in the Rademacher norm: .
Such pair (s,t) is not unique. Observe that circulant matrices have the following shift invariance property (equality is meant in law) Here and in the whole proof addition inside indices is mod n.
Hence i, j ∈ I − k for at least n/2 distinct values of k. It follows that a ij s i t j ≤ 2 n n k=1 a ij s i t j ½ {i,j ∈I−k} and the following estimate holds: where the last inequality follows from (13) and the Khintchine inequality. This proves the lower bound. Now we will show the upper bound. Since the problem is homogenous, we may assume Clearly A (k) are n × n circulant matrices, (a ij ) = k A (k) and for any i, j there is at most one k such that a (k) ij = 0. Applying (11) and (12) to the matrix e k 0 A (0) we obtain For k ≥ 1 we assume that d k is the degree of A (k) . Then Corollary 2.6 applied to the matrix It is not hard to see that A (k) ε,log(n+1) ≤ (a ij ) ε,log(n+1) . Moreover, by the Cauchy-Schwartz inequality, for any i we have

Now the triangle inequality yields
Remark. The crucial observation in the proof of Theorem 1.3 was that for a (directed) circulant graph (V, E) of degree d there exists matrices B 1 , . . . , B N such that ½ E ≤ 1 N N k=1 B k and B k are adjacency matrices of graphs with components of cardinality at most exponential in d. We do not know how broad is the class of graphs of degree d with such property.

Estimates for Rademacher norms
In this section we will show estimates for the quantity (a ij ) ε,p = sup and apply them in few concrete situations. We begin with the proof of Proposition 1.4, which gives two-sided bound for A ε,p in the case of 0-1 matrices.
Proof of Proposition 1.4. To get the lower estimate let us fix I ⊂ E of cardinality at most p. Then for s, t ∈ R n we have Taking the supremum over s, t ∈ S n−1 we get ½ E ε,p ≥ 1 2 ½ I .
To establish the reverse estimate define Bounding the operator norm of a matrix by its Hilbert-Schmidt norm we get ½ I ≤ |I|, so that M ≤ √ p. We will also assume that p ≥ 2 is an integer (since we may change p to max{2, ⌈p⌉} ≤ 2p -observe that the RHS of (4) is sublinear with respect to p).
To show the upper bound in (4) it is enough to prove that Indeed, suppose that (15) holds. Let us fix s, t with s 2 , t 2 ≤ 1 and define I := {(i, j) ∈ E : |s i t j | ≥ M/p}. Then by (15) we have |I| ≤ C 1 p, so we may decompose I into sets I 1 , I 2 , . . . I ⌈C 1 ⌉ of cardinality at most p each and get On the other hand the Khintchine inequality yields Thus by the triangle inequality, Let us now make two simple observations on the cardinality of intersection of the set E with rectangles. To make the notation more concise set r := p 2 /M 2 ≥ p.
Since it is only a matter of permutation of rows and columns of the matrix ½ E (recall that we do not assume any symmetry) to establish (15) we may assume without loss of generality that |s 1 | ≥ |s 2 | ≥ . . . ≥ |s n | and |t 1 | ≥ |t 2 | ≥ . . . ≥ |t n |. We also put t k = s k = 0 for k > n. Let D k := {2 k , 2 k + 1, . . . , 2 k+1 − 1}. Then by the monotonicity assumption and above observations, Observe that by monotonicity of (s k ) we have In the same way it follows that k≥0 2 k t 2 2 k ≤ 2. Therefore Now we will estimate the second term in (16). To this aim fix x > 0 and define Note that the function k = k(l) is strictly decreasing on B(x). We have Finally observe that Estimates (16)-(18) imply (15), which completes the proof.
The proof of the upper bound in Proposition 1.4 strongly relies on the assumption that a ij ∈ {0, 1}. In the general case we can prove the same lower bound, but the upper estimate that we provide is significantly weaker. By the Khintchine inequality we have so in each row and column there are at most p nonzero elements of matrixÃ and the length of each row and column ofÃ is at most M . Thus by Lemma 2.1 we have that To estimate the latter quantity take vectors s, t ∈ S n−1 with support at most p and define Then |I k 0 | ≤ p and TakingĨ ⊂ I k 0 +1 of cardinality ⌊p⌋ ≥ p/2 we see that By the Khintchine inequality we have be such that 2 1−l > |a ij s i t j | ≥ 2 −l for some l > k 0 . Then Obviously |I k | ≤ |supp(s)||supp(t)| ≤ p 2 , so using (19) we obtain Let us take k > k 0 , then we may partition [n] into disjoint sets J 1 , . . . , J n k such that Observe that p(n k − 1) ≤ |I k | so n k ≤ |I k |/p + 1 ≤ 2|I k |/p. We have Thus p2 −2k |I k | ≤ 8M 2 and Propositions 1.4 and 1.5 provide bounds for A ε,p , however they involve suprema of operator norms that are quite hard to estimate. In the sequel we will discuss more conrete estimates, concentrating on the case when A is the adjacency matrix of d-dimensional hypercube or more general d-dimensional discrete tori.
In the case when A = ½ E is the adjacency matrix of a graph G = (V, E) we will denote ½ E ε,p by N ε,p (G), i.e.
The next simple lemma presents general bounds that work well for small values of p.
Lemma 3.1. Let G be a graph of maximum vertex degree d. Then N ε,p (G) ≤ min{d, √ p} for any p ≥ 1 and p/8 ≤ N ε,p (G) ≤ √ p for 1 ≤ p ≤ d.
Proof. We have Moreover, by the Khintchine inequality, Hence the bound N ε,p (G) ≤ min{d, √ p} easily follows.
To Then The next lemma gives bounds on N ε,p (G) in terms of expansion and sparsity parameters of G.
Let us fix E 1 ⊂ E of cardinality at most p and vectors s, t with s 2 ≤ 1 and t 2 ≤ 1.
Observe that We also have We have ∞ k,l=1 In a similar way we show that ∞ k,l=1 and (22) easily follows.
The next proposition shows how to apply previous general bounds to the case of the Hamming hypercube.
Proof. If p ≥ |E| = d2 d then (20) applied with I = J = V shows that N ε,p (E) |E|/|V | = d. Since N ε,p (G) ≤ d by Lemma 3.1 we get the first part of the assertion. To see the lower estimate in (23) define, for 0 ≤ l ≤ n, V l as the set of all vertices from V with exactly l coordinates equal to 1. Then a l := |V l | = d l . There are exactly ka k edges in V k × V k−1 , so for 1 ≤ k ≤ d/2 such that p ≥ ka k we have by (20) We have ka k = k d k ≤ (2ed/k) k , hence the condition p ≥ ka k holds (recall that d ≤ p ≤ d2 d ) for k of the order ln p/ ln(ed/ ln p) and the lower bound in (23) follows.
To get the upper bound we will use the second part of Lemma 3.2. Harper's edgeisoperimetric inequality on the hypercube [4] states that for any set I ⊂ V with cardinality at most 2 k , |E ∩ (I × I)| ≤ k2 k . So for p ≥ 2, We may easily extend bounds from the previous proposition to the case Z d m , due to the following simple lemma.
Proof. Lower bounds follow from the fact that Z d 2 is a subgraph of Z d m for any m ≥ 2. To show the upper bound, we first consider the case Z d 2k . Then there are two partitions of Z 2k = {1, . . . , 2k} into pairs: I 1,l := {2l − 1, 2l}, 1 ≤ l ≤ k and I 2,l = {2l, 2l + 1}, 1 ≤ l ≤ k (we identify m + 1 with 1). For a fixed i = (i 1 , i 2 , . . . , i d ) ∈ {1, 2} d we may treat I i,l = I i 1 ,l 1 × I i 2 ,l 2 × · · · × I i d ,l d , l = (l 1 , . . . , l d ) ∈ {1, . . . , k} d as disjoint subgraphs of Z d 2k isomorphic to Z d 2 . Let E i denote edges of Z d 2k joining vertices from I i,l for some l ∈ {1, . . . , k} d and let A i be the adjacency matrix of ({1, . . . , m} d , E i ). Then A i is a blockdiagonal matrix with k d -blocks such that each block A i,l is isomorphic to the adjacency matrix of Z d 2 . Thus part (i) of Lemma 2.5 yields Observe that every edge of Z d 2k belongs to exactly 2 d−1 sets E i (if vertices of this edge differ at the cooordinate k, there is only one way to choose i k and all other i j j = k may be chosen in an arbitrary way). Hence we have In the case Z d 2k+1 we proceed in a similar way, but we consider 3 types of disjoint families of pairs I 1 = {{2l−1, 2l}, 1 ≤ l ≤ k}, I 2 = {{2l, 2l+1}, 1 ≤ l ≤ k} and I 3 = {{2k +1, 1}}.
In particular for 2 ≤ m ≤ e d we have , where the last equivalence holds by Lemma 3.4.
. The last part of the assertion follows by (24) and Proposition 3.3.

Extensions
In this section we will consider a more general class of random matrices with entries satisfying the condition X ij 2p ≤ α X ij p for all i, j ≤ n and p ≥ 1, where α ≥ 1 is a fixed constant. We may generalize to this case the bound from Theorem 1.1.
Theorem 4.1. Let (X ij ) i,j≤n be independent, mean zero r.v's satisfying condition (25). Then In order to show this result we need the following generalization of Lemma 2.1. Proof. We proceed in the similar way as in the proof of Lemma 2.1. We change p −1 to p −2β in the definition of sets I k (z) and use the following extension of the Khintchine inequality (cf. Lemma 4.1 in [7]) Proof of Theorem 4.1. The proof of Theorem 1.1 works here with the following modifications: • The constants depend on α; • We use the generalized Khintchine-Kahane inequality: E (X ij ) ∼ α (E (X ij ) 2 ) 1/2 (cf. [7]); • Instead of the Khintchine inequality we apply (27); • To get P(S l ≥ c ′ (α) S l log(k+1) ) ≥ c ′ (α) √ k we use the Paley-Zygmund inequality and (27) as in the proof of (4.6) in [7].  .
The operator norm is trivially bigger than maximum length of columns/rows. In [8] it was shown that in the case when X ij are mixtures of Gaussian r'v's then this bound may be reversed in expectation: The proposition below implies in particular that Conjecture 4.3 holds for mixtures of Gaussian variables. . and, applying again (27), ≤ C(α) log β (k + 1) sup Remark 4.5. The decomposition/permutation trick of [8] shows that in order to show the upper part of Conjecture 4.3 it is enough to prove that .

(28)
Sketch of the proof. The standard argument (cf. proof of [8, Corollary 4.1]) shows that we may consider symmetric matrices X = (X ij ), i.e. the case when X ij = X ji and (X ij ) i≥j are independent mean zero r.v's satisfying condition (25). We assume that (28) holds for any square submatrix of X and we will show that E (X ij ) α a + γ, .