On the spectral norm of a random Toeplitz matrix

Suppose that $T_n$ is a Toeplitz matrix whose entries come from a sequence of independent but not necessarily identically distributed random variables with mean zero. Under some additional tail conditions, we show that the spectral norm of $T_n$ is of the order $\sqrt{n \log n}$. The same result holds for random Hankel matrices as well as other variants of random Toeplitz matrices which have been studied in the literature.


Introduction and results
Let X 0 , X 1 , X 2 , . . . be a family of independent random variables. For n ≥ 2, T n denotes the n × n random symmetric Toeplitz matrix T n = X |j−k| 1≤j,k≤n , In [1], Bai asked whether the spectral measure of n −1/2 T n approaches a deterministic limit measure µ as n → ∞. Bryc, Dembo, and Jiang [5] and Hammond and Miller [8] independently proved that this is so when the X j are identically distributed with variance 1, and that with these assumptions µ does not depend on the distribution of the X j . The measure µ does not appear to be a previously studied probability measure, and is described via rather complicated expressions for its moments. This limiting spectral measure µ has unbounded support, which raises the question of the asymptotic behavior of the spectral norm T n , i.e., the maximum absolute value of an eigenvalue of T n . (This problem is explicitly raised in [5,Remark 1.3].) This paper shows, under slightly different assumptions from [5,8], that T n is of the order √ n log n. Here the X j need not be identically distributed, but satisfy stronger moment or tail conditions than in [5,8]. The spectral norm is also of the same order for other related random matrix ensembles, including random Hankel matrices. In the case of Hankel matrices, Theorems 1 and 3 below generalize in a different direction a special case of a result of Masri and Tonge [8] on multilinear Hankel forms with ±1 Bernoulli entries.
A random variable X will be called subgaussian if for some constant a > 0. A family of random variables is uniformly subgaussian if each satisfies (1) for the same constant a.
Theorem 1. Suppose X 0 , X 1 , X 2 , . . . are independent, uniformly subgaussian random variables with EX j = 0 for all j. Then where c 1 > 0 depends only on the constant a in the subgaussian estimate (1) for the X j . 1 Simple scaling considerations show that one can take c 1 = Ca −1/2 for some absolute constant C > 0. In principle an explicit value for C can be extracted from the proof of Theorem 1. No attempt has been made to do so, since the techniques used in this paper are suited for determining rough orders of growth, and not precise constants. Similar remarks apply to the constants which appear in the statements of Theorems 2 and 3 below.
By strengthening the subgaussian assumption, the statement of Theorem 1 can be improved from a bound on expectations to an almost sure asymptotic bound. Recall that a real-valued random variable X (or more properly, its distribution) is said to satisfy a logarithmic Sobolev inequality for every smooth f : R → R such that Ef 2 (X) = 1. Standard normal random variables satisfy a logarithmic Sobolev inequality with constant 1. Furthermore, it is well known that independent random variables with bounded logarithmic Sobolev constants are uniformly subgaussian and possess the same concentration properties as independent normal random variables (see [11] or [12,Chapter 5] We remark that according to the definition used here, T n is a submatrix of T n+1 , but this is only a matter of convenience in notation. Theorem 2 remains true regardless of the dependence among the random matrices T n for different values of n.
It seems unlikely that the stronger hypotheses of Theorem 2 are necessary. In fact a weaker version can be proved under the hypotheses of Theorem 1 alone; see the remarks following the proof of Theorem 2 in Section 2.
When the X j have variance 1, the upper bound √ n log n of Theorems 1 and 2 is of the correct order. In fact the matching lower bound holds under less restrictive tail assumptions, as the next result shows.
Theorem 3. Suppose X 0 , X 1 , X 2 , . . . are independent and for some constant B, each X j satisfies where c 3 > 0 depends only on B.
In the case that EX 2 j = 1 and E|X j | 3 < ∞, it is a consequence of Hölder's inequality that E|X j | ≥ (E|X j | 3 ) −1 . Thus the lower bound on first absolute moments assumed in Theorem 3 is weaker than an upper bound on absolute third moments, and is in particular satisfied for uniformly subgaussian random variables. Section 2 below contains the proofs of Theorems 1-3.As mentioned above, Theorems 1-3 also hold for other ensembles of random Toeplitz matrices, as well as for random Hankel matrices. Section 3 discusses these extensions of the theorems and makes some additional remarks.
Acknowledgement. The author thanks A. Dembo for pointing out the problem considered in this paper.

Proofs
The proof of Theorem 1 is based on Dudley's entropy bound [6] for the supremum of a subgaussian random process. Given a random process {Y x : x ∈ M }, a pseudometric on M may be defined by Dudley's entropy bound is the following (see [18,Proposition 2.1] for the version given here).
where K > 0 depends only on the constant b in the subgaussian estimate (2) for the process.
We will also need the following version of the classical Azuma-Hoeffding inequality. This can be proved by a standard Laplace transform argument; see e.g. [13, Fact 2.1].
Proposition 5. Let X 1 , . . . , X n be independent, symmetric, uniformly subgaussian random variables. Then for any a 1 , . . . , a n ∈ R and t > 0, where b > 0 depends only on the constant a in the subgaussian estimate (1) for the X j .
Proof of Theorem 1. We first reduce to the case in which each X j is symmetric. Let T ′ n be an independent copy of T n . Since ET n = 0, by Jensen's inequality, which are independent, symmetric, uniformly subgaussian random variables (with a possibly smaller constant a in the subgaussian estimate). Thus we may assume without loss of generality that the X j are symmetric random variables.
We next bound T n by the supremum of a subgaussian random process. A basic feature of the theory of Toeplitz matrices is their relationship to multiplication operators (cf. [4,Chapter 1]). Specifically, the finite Toeplitz matrix T n is an n × n submatrix of the infinite Laurent matrix Consider L n as an operator on ℓ 2 (Z) in the canonical way, and let ψ : ℓ 2 (Z) → L 2 [0, 1] denote the usual trigonometric isometry ψ(e j )(x) = e 2πijx . Then ψL n ψ −1 : L 2 → L 2 is the multiplication operator corresponding to the L ∞ function where which implies that By (3), Proposition 4, and the substitution ε = 4n 3/2 e −t 2 , Integration by parts and the classical estimate 1 Combining the case s = √ 2 log 2n of this estimate with (4) completes the proof.
The proof of Theorem 2 is based on rather classical measure concentration arguments commonly applied to probability in Banach spaces. Under the assumption (i), up to the precise values of constants the estimate P T n ≥ E T n + t ≤ e −t 2 /32A 2 n ∀t > 0 follows from any of several standard approaches to concentration of measure (cf. Corollary 1.17, Corollary 4.5, or Theorem 7.3 of [12]; the precise statement can be proved from Corollary 1.17). Combining this with Theorem 1 yields P T n ≥ (c 1 + 8A) n log n ≤ 1 n 2 , which completes the proof via the Borel-Cantelli lemma.
The proof under the assumption (ii) is similar. By the triangle inequality and the Cauchy-Schwarz inequality, so that the map (X 0 , . . . , X n−1 ) → T n has Lipschitz constant bounded by 2 √ n.
The proof is completed in the same way as before (with a different dependence of c 2 on A).
As remarked above, a weaker version of Theorem 2 may be proved under the assumptions of Theorem 1 alone. From the proof of Proposition 4 in [18] one can extract the following tail inequality under the assumptions of Proposition 4: The explicit statement here is adapted from lecture notes of Rudelson [16]. Using the estimates derived in the proof of Theorem 1 and applying the Borel-Cantelli lemma as above, one directly obtains (6) lim sup n→∞ T n √ n log n ≤ c 4 almost surely under the assumptions that the X j are symmetric and uniformly subgaussian. The general (nonsymmetric but mean 0) case can be deduced from the argument for the symmetric case. Let T ′ n be an independent copy of T n . By independence, the triangle inequality, and the tail estimate which follows from (5), −ct 2 /n log n for some constant c which depends on the subgaussian estimate for the X j . By Theorem 1 and Chebyshev's inequality, Picking s = 2c 1 √ n log n and t = 2n c log n yields P[ T n ≥ c 4 √ n log n ≤ 4 n 2 for some constant c 4 , and (6) then follows from the Borel-Cantelli lemma.
The proof of Theorem 3 amounts to an adaptation of the proof of the lower bound in [14], with much of the proof abstracted into a general lower bound for the suprema of certain random processes due to Kashin and Tzafriri [9,10]. The following is a special case of the result of [10]. Proposition 6. Let ϕ j : [0, 1] → R, j = 0, . . . , n − 1 be a family of functions which are orthonormal in L 2 [0, 1] and satisfy ϕ j L 3 [0,1] ≤ A for every j, and let X 0 , . . . , X n−1 be independent random variables such that for every j, Then for any a 0 , . . . , a n−1 ∈ R, where a p = n−1 j=0 |a j | p 1/p and K > 0 depends only on A and B.

Proof of Theorem 3. First make the estimate
where v x ∈ C n is defined by (v x ) j = e 2πijx for j = 1, . . . , n and ·, · is the standard inner product on C n . Therefore where we have defined a 0 = 1, a j = √ 2(1 − j/n) for j ≥ 1, ϕ 0 ≡ 1, and ϕ j (x) = √ 2 cos(2πjx) for j ≥ 1. It is easy to verify that a 2 > √ n/2 and a 4 < 2n 1/4 . The theorem now follows from Proposition 6.
We remark that by combining Theorem 3 with the proof of Theorem 2, one obtains a nontrivial bound on the left tail of T n under the assumptions of Theorem 2 and the additional assumption that EX 2 j = 1 for every j. Unfortunately, one cannot deduce an almost sure lower bound of the form lim inf n→∞ T n √ n log n ≥ c almost surely without more precise control over the constants in Proposition 6 and the concentration inequalities used in the proof of Theorem 2.

Extensions and additional remarks
3.1. Other random matrix ensembles. For simplicity Theorems 1-3 were stated and proved only for the case of real symmetric Toeplitz matrices. However, straightforward adaptations of the proofs show that the theorems hold for other related ensembles of random matrices. These include nonsymmetric real Toeplitz matrices X j−k j,k∈Z for independent random variables X j , j ∈ Z, as well as complex Hermitian or general complex Toeplitz variants. In the complex cases one should consider matrix entries of the form X j = Y j + iZ j , where Y j and Z j are independent and each satisfy the tail or moment conditions imposed on X j in the theorems as stated.
Closely related to the case of nonsymmetric random Toeplitz matrices are random Hankel matrices H n = X j+k−1 1≤j,k≤n , which are constant along skew diagonals. This ensemble was also mentioned by Bai [1], and was shown to have a universal limiting spectral distribution in [5]. Independently, Masri and Tonge [14] considered a random r-linear Hankel form in the case P[X j = 1] = P[X j = −1] = 1/2, and showed that the expected norm of this form is of the order n r−1 log n. As observed in [5,Remark 1.2], H n has the same singular values, and so in particular the same spectral norm, as the (nonsymmetric) Toeplitz matrix obtained by reversing the order of the rows of H n . Therefore Theorems 1-3 apply to H n as well. As mentioned in the introduction, the versions of Theorems 1 and 3 for H n generalize the r = 2 case of the result of [14] to subgaussian matrix entries X j .
The methods of this paper can also be used to treat random Toeplitz matrices with additional restrictions. For example, the theorems apply to the ensemble of symmetric circulant matrices considered in [2,Remark 2] which is defined as T n here except for the restriction that X n−j = X j for j = 1, . . . , n − 1, and the closely related symmetric palindromic Toeplitz matrices considered in [15], in which X n−j−1 = X j for j = 0, . . . , n − 1. We remark that [2,15] show that each of these ensembles, properly scaled and with some additional assumptions, have a limiting spectral distribution which is normal.

Weaker hypotheses.
It is unclear how necessary the tail or moment conditions on the X j are to the conclusions of the theorems. It appears likely (cf. [19,3]) that versions of Theorems 1 and 2 remain true assuming only the existence of fourth moments, at least when the X j are identically distributed. In particular it is very likely that the assumptions of Theorem 2 can be relaxed considerably. Even within the present proof, the assumption of a logarithmic Sobolev inequality can be weakened slightly to that of a quadratic transportation cost inequality; cf. [12,Chapter 6].
If the X j have nonzero means then the behavior of T n may change. Suppose first that the X j are uniformly subgaussian and EX j = m = 0 for every j. If J n denotes the n × n matrix whose entries are all 1, then (6) implies that (7) lim sup n→∞ T n − mJ n √ n log n ≤ c almost surely, where c depends on m and the subgaussian estimate for the X j . Since J n = n, (7) and the triangle inequality imply a strong law of large numbers: (8) lim n→∞ T n n = |m| almost surely.
In [3], (8) was proved using estimates from [5] under the assumption that the X j are identically distributed and have finite variance. We emphasize again that while the methods of this paper require stronger tail conditions, we never assume the X j to be identically distributed.
More generally, the behavior of T n depends on the rate of growth of the spectral norms of the deterministic Toeplitz matrices ET n . The same argument as above shows that lim n→∞ T n ET n = 1 almost surely if the random variables (X j − EX j ) are uniformly subgaussian and lim n→∞ √ n log n ETn = 0. On the other hand, if ET n = o( √ n log n) then the conclusion of Theorem 1 holds.
3.3. Random trigonometric polynomials. The supremum of the random trigonometric polynomial has been well-studied in the special case P[X j = 1] = P[X j = −1] = 1/2, in work dating back to Salem and Zygmund [17]. Observe that Z x is essentially equivalent to the process Y x defined in the proof of Theorem 1, and is also closely related to the random process considered in the proof of Theorem 3. Halász [7] proved in particular that lim n→∞ sup 0≤x≤1 |Z x | √ n log n = 1 almost surely.
From this it follows that when P[X j = 1] = P[X j = −1] = 1/2 for every j, the conclusion of Theorem 2 holds with c 2 = 2. Numerical experiments suggest, however, that the optimal value of c 2 is 1 in this case, and more generally when the X j are i.i.d. with mean 0 and variance 1. Conversely, adaptations of the proofs in this paper yield less numerically precise bounds for the supremum of Z x under the same weaker assumptions on the X j in the statements of the theorems. We remark that the techniques used to prove the results of [9,10,14] cited above (and hence indirectly also Theorem 3) were adapted from the work of Salem and Zygmund in [17].