Small deviations for beta ensembles

We establish various small deviation inequalities for the extremal (soft edge) eigenvalues in the beta-Hermite and beta-Laguerre ensembles. In both settings, upper bounds on the variance of the largest eigenvalue of the anticipated order follow immediately.


Introduction
In the context of their original discovery, the Tracy-Widom laws describe the fluctuations of the limiting largest eigenvalues in the Gaussian Orthogonal, Unitary, and Symplectic Ensembles (G{O/U/S}E) [22,23]. These are random matrices of real, complex, or quaternion Gaussian entries, of mean zero and mean-square one, independent save for the condition that the matrix is symmetric (GOE), Hermitian (GUE), or appropriately self-dual (GSE). The corresponding Tracy-Widom distribution functions have shape F T W (t) ∼ e where β = 1 in the case of GOE, β = 2 for GUE, and β = 4 for GSE.
Since that time, it has become understood that the three Tracy-Widom laws arise in a wide range of models. First, the assumption of Gaussian entries may be relaxed significantly, see [20], [21] for instance. Outside of random matrices, these laws also describe the fluctuations in the longest increasing subsequence of a random permutation [2], the path weight in last passage percolation [11], and the current in simple exclusion [11,24], among others.
It is natural to inquire as to the rate of concentration of these various objects about the limiting Tracy-Widom laws. Back in the random matrix setting, the limit theorem reads: with λ max the largest eigenvalue in the n×n GOE, GUE or GSE, it is the normalized quantity n 1/6 (λ max − 2 √ n) which converges to Tracy-Widom. Thus, one would optimally hope for estimates of the form: P λ max − 2 √ n ≤ −ε √ n ≤ Ce −n 2 ε 3 /C , P λ max − 2 √ n ≥ ε √ n ≤ Ce −nε 3/2 /C , for all n ≥ 1, all ε ∈ (0, 1] say, and C a numerical constant. Such are "small deviation" inequalities, capturing exactly the finite n scaling and limit distribution shape (compare (1.1)). Taking ε beyond O(1) in the above yields more typical large deviation behavior and different (Gaussian) tails (see below). As discussed in [14,15], the right-tail inequality for the GUE (as well as for the Laguerre Unitary Ensemble, again see below) may be shown to follow from results of Johansson [11] for a more general invariant model related to the geometric distribution that uses large deviation asymptotics and sub-additivity arguments. The left-tail inequality for the geometric model of Johansson (and thus by some suitable limiting procedure for the GUE and the Laguerre Unitary Ensemble) is established in [3] together with convergence of moments using delicate Riemann-Hilbert methods. We refer to [15] for a discussion and the relevant references, as well as for similar inequalities in the context of last passage percolation etc. By the superposition-decimation procedure of [10], the GUE bounds apply similarly to the GOE (see also [16]).
Our purpose here is to present unified proofs of these bounds which apply to all of the so-called beta ensembles. These are point-processes on R defined by the n-level joint density: for any β > 0, P(λ 1 , λ 2 , . . . , λ n ) = 1 Z n,β j<k |λ j − λ k | β e −(β/4) At β = 1, 2, 4 this joint density is shared by the eigenvalues of G{O/U/S}E. Furthermore, these three values give rise to exactly solvable models. Specifically, all finite dimensional correlation functions may be described explicitly in terms of Hermite polynomials. For this reason, the measure (1.2) has come to be referred to the β-Hermite ensemble; we will denote it by H β . Importantly, off of β = 1, 2, 4, despite considerable efforts (see [9], Chapter 13 for a comprehensive review), there appears to be no characterization of the correlation functions amenable to asymptotics. Still, Ramírez-Rider-Virág [18] have shown the existence of a general β Tracy-Widom law, T W β , via the corresponding limit theorem: with self-evident notation, This result makes essential use of a (tridiagonal) matrix model valid at all beta due to Dumitriu-Edelman [5], and proves the conjecture of Edelman-Sutton [6]. As to finite n bounds, we have the following.
The restriction to β ≥ 1 is somewhat artificial. On the other hand, bounds of this type cannot remain meaningful all the way down to β = 0. Our method in fact applies to all beta bounded below, though with the reported C a function of whatever specified minimal beta. Keeping β ≥ 1 covers the cases of classical interest while allowing for a clearer picture of the achieved beta dependence in our estimates (as well as cleaner proofs).
For completeness we also mention that for ε beyond O(1), the large-deviation right-tail inequality takes the form For β = 1 and 2 this follows from standard net arguments on the corresponding Gaussian matrices (see e.g. [15]). For other values of β, crude bounds on the tridiagonal models discussed below immediately yield the claim. Continuing, those well versed in random matrix theory will know that this style of small deviation questions are better motivated in the context of "null" Wishart matrices, given their application in multivariate statistics. Also known as the Laguerre Orthogonal or Unitary Ensembles (L{O/U}E), these are ensembles of type XX * in which X is an n × κ matrix comprised of i.i.d. real or complex Gaussians.
By the obvious duality, we may assume here that κ ≥ n. When n → ∞ with the κ/n converging to a finite constant (necessarily larger than one), the appropriately centered and scaled largest eigenvalue was shown to converge to the natural Tracy-Widom distribution; first by Johansson [11] in the complex (β = 2) case, then by Johnstone [12] in the real (β = 1) case. Later, El Karoui [7] proved the same conclusion allowing κ/n → ∞.
For β = 2 and κ a fixed multiple of n, a small deviation upper bound at the right-tail (as well as the corresponding statement for the minimal eigenvalue in the "soft-edge" scaling) was known earlier (see [14,15]), extended recently to non-Gaussian matrices in [8].
Once again there is a general beta version. Consider a density of the form (1.2) in which the Gaussian weight w(λ) = e −βλ 2 /4 on R is replaced by w(λ) = λ (β/2)(κ−n+1)+1 e −βλ/2 , now restricted to R + . Here κ can be any real number strictly larger than n − 1. It is when κ is an integer and β = 1 or 2 that one recovers the eigenvalue law for the real or complex Wishart matrices just described. For general κ and β > 0 the resulting law on positive points λ 1 , . . . , λ n is referred to as the β-Laguerre ensemble, here L β for short. Using a tridiagonal model for L β introduced in [5], it is proved in [18]: for κ+ 1 > n → ∞ with κ/n → c ≥ 1, This covers all previous results for real/complex null Wishart matrices. Comparing (1.3) and (1.5) one sees that O(n 2/3 ε) deviations in the Hermite case should correspond to deviations of order (κn) 1/6 ( √ κ + √ n) 2/3 ε = O(κ 1/2 n 1/6 ε) in the Laguerre case. That is, one might expect bounds exactly of the form found in Theorem 1 with appearances of n in each exponent replaced by κ 3/4 n 1/4 . What we have is the following.
The right-tail inequality is extended to non-Gaussian matrices in [8]. The rather cumbersome exponents in Theorem 2 do produce the anticipated decay, though only for ε ≤ n/κ. For ε ≥ n/κ, the right and left-tails become linear and quadratic in ε respectively. This is to say that the large deviation regime begins at the order O( n/κ) rather than O(1) as in the β-Hermite case. To understand this, we recall that, normalized by 1/κ, the counting measure of the L β points is asymptotically supported on the interval with endpoints (1 ± n/κ) 2 . This statement is precise with convergent n/κ, and the limiting measure that of Marčenko-Pastur. Either way, n/κ is identified as the spectral width, in contrast with the semi-circle law appearing in the β-Hermite case which is of width one (after similar normalization). Of course, in the more usual set-up when c 1 n ≤ κ ≤ c 2 n (c 1 ≥ 1 necessarily) all this is moot: the exponents above may then be replaced with −βnε 3/2 /C and −βn 2 ε 3 /C for ε in an O(1) range with no loss of accuracy. And again, the large deviation tails were known in this setting for β = 1, 2.
An immediate consequence of the preceding is a finite n (and/or κ) bound on the variance of λ max in line with the known limit theorems. This simple fact had only previously been available for GUE and LUE (see the discussion in [15]). Corollary 3. Take β ≥ 1. Then, with now constant(s) C β dependent upon β.
The same computation behind Corollary 3 implies that for any p, and similarly for λ max (L β ). Hence, we also conclude that all moments of the (scaled) maximal H β and L β eigenvalues converge to those for the T W β laws (see [3] for β = 2). Finally, there is the matter of whether any of the above upper bounds are tight. We answer this in the affirmative in the Hermite setting.
Theorem 4. There is a numerical constant C so that The first inequality holds for all n ≥ 1, 0 < ε ≤ 1, and β ≥ 1. For the second inequality, the range of ε must be kept sufficiently small, 0 < ε ≤ 1/C say.
Our proof of the right-tail lower bound takes advantage of a certain independence in the β-Hermite tridiagonals not immediately shared by the Laguerre models, but the basic strategy also works in the Laguerre case. Contrariwise, our proof of the left-tail lower bound uses a fundamentally Gaussian argument that is not available in the Laguerre setting.
The next section introduces the tridiagonal matrix models and gives an indication of our approach. The upper bounds (Theorems 1 and 2, Corollary 3) are proved in Section 3; the H β lower bounds in Section 4. Section 5 considers the analog of the right-tail upper bound for the minimal eigenvalue in the β-Laguerre ensemble, this case holding the potential for some novelty granted the existence of a different class of limit theorems (hard edge) depending on the limiting ratio n/κ. While our method does produce a bound, the conditions on the various parameters are far from optimal. For this reason we relegate the statement, along with the proof and further discussion, to a separate section.

Tridiagonals
The results of [18] identify the general β > 0 Tracy-Widom law through a random variational principle: in which x → b(x) is a standard Brownian motion and L is the space of functions f which vanish at the origin and satisfy The equality here is in law, or you may view (2.1) as the definition of T W β . This variational point of view also guides the proof of the convergence of the centered and scaled λ max (of H β or L β ) to T W β . In particular, given the random tridiagonals which we are about to introduce, one always has a characterization of λ max through Raleigh-Ritz. In [18], the point is to show this "discrete" variational problem goes over to the continuum problem (2.1) in a suitable sense. Furthermore, an analysis of the continuum problem has been shown to give sharp estimates on the tails of the β Tracy-Widom law (again see [18]). Our idea here is therefore retool those arguments for the finite n, or discrete, setting.
We start with the Hermite case. Let g 1 , g 2 , . . . g n be independent Gaussians with mean 0 and variance 2. Let also χ β , χ 2β , . . . , χ (n−1)β be independent χ random variables of the indicated parameter. Then, re-using notation, [5] proves that the n eigenvalues of the random tridiagonal matrix The problem at hand (Theorem 1) then becomes that of estimating P sup where we have introduced the usual Euclidean norm ||v|| 2 2 = n k=1 v 2 k . To make the connection between H(v) and the continuum form (2.1) even more plain we have the following.
We defer the proof until the end of the section, after a description of the allied L β setup. The point of Lemma 5 should be clear. For instance, for an upper bound on the first probability in (2.3) one may replace H by H b with any sufficiently small b > 0, and so on.
The model for L β is as follows. For κ > n − 1, introduce the random bidiagonal matrix with the same definition for the χ's and again all variables independent. (The use of χ is meant to emphasize this independence between the diagonals.) Now [5] shows that it is the eigenvalues of L β = (B β )(B β ) T which have the required joint density. 2 Note that L β does not have independent entries. Similar to before, we define The added normalization by √ κ makes for better comparison with the Hermite case. With this, and since κ > n − 1, to prove Theorem 2 is to establish bounds on the following analogs of (2.3): Finally, we state the Laguerre version of Lemma 5. (We prove only the latter as they are much the same).
Proof of Lemma 5. Writing, shows it is enough to compare, for every v, For this, there is the formula and, if r ≥ 1, also the bounds (The upper bound is simply Jensen's inequality and holds for all r > 0; the lower bound requires more work.) This translates to

Upper Bounds
Theorems 1 and 2 are proved, first for the β-Hermite case with all details present; a second subsection explains the modifications required for the β-Laguerre case. The proof of Corollary 3 appears at the end.

Hermite ensembles
Right-tail. This is the more elaborate of the two. The following is a streamlined version of what is needed.
Proposition 7. Consider the model quadratic form, for fixed b > 0 and independent mean-zero random variables {z k } k=1,...,n satisfying the uniform tail bound E[e λz k ] ≤ e cλ 2 for all λ ∈ R and some c > 0. There is a C = C(b, c) so that P sup for all ε ∈ (0, 1] and n ≥ 1. The proof of the above hinges on the following version of integration by parts (as in fact does the basic convergence result in [18]). Lemma 8. Let s 1 , s 2 , . . . , s k , . . . be real numbers, and set S k = k ℓ=1 s ℓ , S 0 = 0. Let further t 1 , . . . , t n be real numbers, t 0 = t n+1 = 0. Then, for every integer m ≥ 1, Proof. For any T k , k = 0, 1, . . . , n, write n k=1 Conclude by choosing Proof of Proposition 7. Applying Lemma 8 with s k = z k and t k = v 2 k (bearing in mind that v 0 = v n+1 = 0, and we are free to set s k = 0 for k ≥ n + 1) yields Next, by the Cauchy-Schwarz inequality, for every λ > 0, Continuing requires a tail bound on ∆ 2m (J) for integer J ≥ 0. By Doob's maximal inequality and our assumptions on z k , for every λ > 0 and t > 0, Optimizing in λ, and then applying the same reasoning to the sequence −S ℓ produces for all integers m ≥ 1 and J ≥ 0, and every t > 0. ¿From (3.4) it follows that and similarly Combined, this reads P sup which we have recorded in full for later use. In any case, the choice m = [ε −1/2 ] will now produce the claim.
We may now dispense of the proof of Theorem 1 (Right-Tail). Before turning to the proof, we remark that if ε > 1, one may run through the above argument and simply choose m = 1 at the end to produce the classical form of the large deviation inequality (1.4) known previously for β = 1, 2.
We turn to the values 0 < ε ≤ 1. The form (2.4) is split into two pieces, Proposition 7 applying to each.
The first term on the right is precisely of the form (3.1) with each z k an independent mean-zero Gaussian of variance 2, which obviously satisfies the tail assumption with c = 1. The second term,H b/2 (v, χ), is a bit different, having noise present through the quantity n−1 k=1 (χ β(n−k) − Eχ β(n−k) )v k v k+1 . But carrying out the integration by parts on t k = v k v k+1 (and s k = χ β(n−k) − Eχ β(n−k) ), will produce a bound identical to (3.4), with an additional factor of 2 before each appearance of ∆ 2m . Thus, we will be finished granted the following bound.
Lemma 9. For χ a χ random variable with parameter greater than or equal to one, E[e λχ ] ≤ e λEχ+λ 2 /2 , for all λ ∈ R. (3.8) Proof of Lemma 9. This may be viewed as a consequence of the dimension free concentration inequalities for norms of Gaussian vectors [13], but here is an elementary derivation. A χ of parameter r has density function f (x) = c r x r−1 e −x 2 /2 on R + , requiring us to show that x p e −x 2 /2 dx for any p ≥ 0. The case p = 0 can be done by hand, and we neglect it here. Also, we will consider only λ > 0, things being quite the same for λ < 0.
Taking logarithms and then differentiating in λ, we find that the above inequality (for As for this, let X and Y be positive random variables with density functions c q,λ x q e −(x−λ) 2 /2 and c q y q e −y 2 /2 respectively. Now q = p − 1 > −1, and we still have λ > 0. It is easy to convince oneself that EY ≤ EX, which is exactly the second line of (3.9).

Left-Tail.
This demonstrates yet another advantage of the variational picture afforded by the tridiagonal models. Namely, the bound may be achieved by a suitable choice of test vector since P sup for whatever {v k } k=1,...,n on the right hand side. (We have thrown in a constant C for reasons that will be clear in a moment.) Simplifying, we write where in H a (v, g) we borrow the notation of Proposition 7 and Focus on the first term on the right of (3.10), and note that with g a single standard Gaussian. Our choice of v is motivated as follows. The event in question asks for a large eigenvalue (think of √ nε as large for a moment) of an operator which mimics negative Laplacian plus potential. The easiest way to accomplish this would be for the potential to remain large on a relatively long interval, with a flat eigenvector taking advantage. We choose v k = k nε ∧ 1 − k nε for k ≤ nε and zero otherwise, (3.12) for which (Here a ∼ b indicates that the ratio a/b is bounded above and below by numerical constants.) Substitution into (3.11) produces, for choice of C = C(a) large enough inside the probability on the left, P H a (v, g) ≤ −C √ nε ||v|| 2 2 ≤ e −βn 2 ε 3 /C for nε 3/2 ≥ 1. The restriction of the range of ε stems from the gradient-squared term; it also ensures that εn ≥ 1 which is required for our test vector to be sensible in the first place.
Next, as a consequence of Proposition 9 (see (3.8)) we have the bound: for c > 0, (3.14) With c = C √ nε and v as in (3.12), this may be further bounded by e −βn 2 ε 3 /C . Here too we should assume that nε 3/2 ≥ 1. Introducing a multiplicative constant of the advertised form C β extends the above bounds to the full range of ε in the most obvious way. Replacing ε with ε/C throughout completes the proof.

Laguerre ensembles
Right-Tail. We wish to apply the same ideas from the Hermite case to the Laguerre form L b (v) (for small b). Recall: Here, Z k , Z k and Y k are as defined in (2.6), and the appropriate versions of the tail conditions for these variables (in order to apply Proposition 7) are contained in the next two lemmas. . Now, since x ≥ − 1 2 implies log(1 + x) ≥ x − x 2 , for any λ ≤ 1 4 the right hand side of the above is less e r(λ+2λ 2 ) as claimed.
Lemma 11. Let χ and χ be independent χ random variables, each of parameter larger than one. Then, for every λ ∈ R such that |λ| < 1, Proof. For |λ| < 1, using inequality (3.8)  To proceed, we split L b (v) into three pieces now, isolating each of the noise components, and focus on the bound for sup ||v|| 2 =1 L b/3 (v, Z) (the notation indicating (3.15) with only the Z noise term present). One must take some care when arriving at the analog of (3.4). In obtaining an inequality of the form P(∆ m (J, Z) > t) ≤ Ce −t 2 /C we must be able to apply . Setting m to be the nearest integer to 1 √ ε n κ 1/4 puts both exponential factors on the same footing, namely on the order of e −βκ 3/4 n 1/4 ε 3/2 /C , and removes all ε, κ, and n dependence on the first prefactor. Certainly the best decay possible, but requires ε ≤ n/κ. Otherwise, if ε ≥ n/κ, we simply choose m = 1 in which case the second term of (3.18) is the larger and produces decay e −β √ nκε/C . Happily, both estimates agree at the common value ε = n/κ.

Left-Tail.
It is enough to produce the bound for P(L a (v, Z) ≤ −C √ κε||v|| 2 2 ) for large a, given v ∈ R n and a C = C(a) as in the Hermite case. Indeed, (3.16) and (3.17) show that L a (v, Z) and L a (v, Y ) will follow suit.
We have the estimate Here we have introduced the shorthand 20) and have also used the fact that (3.16) applies just as well to −Z k . In fact, the sign precludes any concern over the required choice of λ. For the Y -noise term, care must be taken on this point, but one may check that all is fine given our selection of v below. For the small deviation regime, we use a slight modification of the Hermite test vector (3.12), and set for k ≤ nε/δ and v k = 0 otherwise. This requires ε ≤ δ = (n/κ) 1/2 in order to be sensible, and produces the same appraisals for ||v|| 2 2 , ||v|| 4 4 , ||∇|| 2 2 , and || √ kv|| 2 2 as in (3.13), with each appearance of ε replaced by ε/δ. Substitution into (3.19) yields For ε > n/κ, notice that the particularly simple choice of a constant v gives Combined, these two bounds cover the claimed result, provided that κ 3/2 n 1/2 ε 3 is chosen larger than one in the former. Extending this to the full range of ε and all remaining considerations are the same as in the Hermite setting.

Variances
We provide details for λ max (H β ), the Laguerre case is quite the same. (Neither is difficult.) Write and then split the integrand in two according whether λ max ≤ 2 √ n or λ max > 2 √ n.
First note that our upper bound on the probability that λ max (H β ) − 2 √ n ≤ − √ nε applies to any ε = O(1). Further, from the tridiagonal model we see that λ max stochastically dominates (1/ √ β) max 1≤k≤n g k . Hence, for δ < 0 we have the cheap estimate P(λ max (H β ) ≤ −δ √ n) ≤ e −βn 2 δ 2 , and thus for all ε > 0. This easily produces For the other range, recall that we mentioned at the end of proof for the right-tail upper bound that the advertised estimate is easily extended to the large deviation regime (cf. (1.4)) to read

This results in
and completes the proof.

(Hermite) Lower Bounds
Right-Tail. This follows from another appropriate choice of test vector v. To get started, write P sup Here, as before, Our choice of v is arrived at by examining the first factor above: with, as in the left-tail upper bound, a standard Gaussian g, Now the intuition is that the eigenvalue (of a discretized −d 2 /dx 2 + potential) is being forced large positive, so the potential should localize with the eigenvector following suit.
where we will assume that n ≥ ε −3/2 ≥ ε −1/2 . With these choices we have (recall the notation from (3.20)) and thus the existence of a constant C = C(a) so that Similarly, returning to the second factor on the right hand side of (4.1) and invoking the estimate (3.14) we also have for the same choice of v. And granted nε 3/2 ≥ 1, it follows that P(χ(v) < √ nε||v|| 2 2 ) ≥ 1 − e −1/C throughout this regime. That is, When nε 3/2 ≤ 1, write P sup where ε 0 = n −2/3 ≤ 1 to produce the advertised form of the bound for all n and ε.
Left-Tail. This relies heavily on the right-tail upper bound. The first step is to reduce to a Gaussian setting via independence: for whatever b > 0, P sup Here we also use the notation of the proof of Theorem 1 (right-tail), from which we know that P sup (As β ≥ 1 we are simply dropping it from the exponent on the right at this stage.) Hence, if as we regularly have start with an assumption like nε 3/2 ≥ C 2 ≥ 1, it follows that P sup Turning to H b (v, g) we make yet another decomposition of the noise term. Let L be an integer (1 ≤ L ≤ n) to be specified. Set S L = 1 L L k=1 g k , and Note that the family {η k } k=1,...,n is independent of S L . If the procedure of Proposition 7 could be applied to H b (v, η), we would have an event of probability larger than 1−Ce −nε 3/2 /C (again we simply drop the beta dependence at this intermediate stage) on which Since we are still working under the condition nε 3/2 ≥ C 2 , this is to say that there is an event of probability a least 1 − 1/e, depending only of the η k 's, and on which for every v ∈ R n . If we now choose L + 1 ≥ 6nε/b, we have further on that same event. Note this choice requires ε ≤ b/6; it is here that the range of valid epsilon gets cut down in our final statement. In any case, putting the last remarks together we have proved that P sup and so also P sup again under the constrains nε 3/2 ≥ C 2 and ε ≤ b/6. The last inequality follows as S L is a mean-zero Gaussian with variance of order (nε) −1 . The range nε 3/2 ≤ C 2 is handled as before, P sup where ε 0 = (C 2 /n) 2/3 . As ε 0 must lie under b/6, this last selection requires n ≥ (6/b) 3/2 C 2 , but smaller values of n can now be covered by adjusting the constant.
It remains to go back and verify that P(sup ||v|| 2 =1 H b (v, η) ≥ √ nε) ≤ Ce −nε 3/2 /C . The only reason that Proposition 7 cannot be followed verbatim is that the η k 's are not independent, the first L of them being tied together through S L . We need the appropriate Gaussian tail inequality for the variables and, comparing with (3.4), shows that an estimate of type P(△ m (k, η) > t) ≤ Ce −t 2 /Cm suffices. But and so P △ m (k, η) > t ≤ P △ m (k, g) > t/2 + P(mS L > t/2).
The first term we have already seen to be of the required order, and the second is less than e −Lt 2 /8m 2 . Since we only apply this bound in the present setting when L = Cnε ≥ Cε −1/2 and m = [ε −1/2 ] (the choice made in Proposition 7), we have that P(mS L > t/2) ≤ e −t 2 /Cm , and the proof is complete.

Minimal Laguerre Eigenvalue
While not detailed there, the results of [18] will imply that whenever κ, n → ∞, κ/n → c > 1. This appraisal was long understood for the minimal eigenvalue of L{O/U}E, and has recently been extended to non-Gaussian versions of those ensembles in [8]. The condition κ/n → c > 1 keeps the limiting spectral density supported away from the origin, resulting in the same soft-edge behavior that one has for λ max . If instead κ − n remains fixed in the limit, one has a different scaling and different limit law(s) for λ min , the so-called hard-edge distributions. Granted the existence of the "hard-to-soft transition" for all β > 0 (see [4] and [17]) it is believed that (5.1) holds as long as κ−n → ∞, but (to the best of our knowledge) this has not been explicitly worked out in any setting. We only consider the analogue of the right-tail upper bound for λ min and have the following.
According to (5.1), the deviations are of the order of ( √ κn) 1/3 ( √ κ − √ n) 2/3 ε, which explains the exponent in (5.2). Our condition on ε is certainly not very satisfactory, although still sensible to the fluctuations in (5.1). One would hope for the range of ε to be understandable in terms of the soft/hard edge picture − what we have here arises from technicalities.
On the other hand, if we place an additional, "soft-edge" type, restriction on κ and n, we obtain a more natural looking estimate.
Corollary 13. Again take β ≥ 1, but now assume that κ > cn for c > 1. The right hand side of (5.2) may then be replaced by Ce −βnε 3/2 /C for a C = C(c), with the resulting bound valid for all 0 < ε ≤ 1.
The last statement should be compared with Corollary V.2.1(b) of [8], which applies to classes of non-Gaussian matrices.
As to the proof, we proceed in a by now familiar way. We first set Then, after a rescaling of ε, we will prove the equivalent Similar to the strategy employed above, a series of algebraic manipulations shows that we can work instead with the simplified quadratic form (The condition κ ≥ n + 1 in Theorem 12 is used in passing from L to L ′ .) We remark that under the added condition κ > cn for c > 1, α is bounded uniformly from below and 1 β √ κ E[χ β(κ−k+1) χ β(n−k) ] is bounded below by a constant multiple of √ n − k. Hence, the deterministic part of L ′ is bounded above by a small negative multiple of √ n n−1 k=1 (v k+1 + v k ) 2 + 1 √ n n k=1 kv 2 k . The proof of Corollary 13 is then identical to that of the right-tail upper bound for λ max (L β ).
Back to Theorem 12 and α's unbounded from below, we begin by rewriting the noise term in L ′ as 1 (with the convention that χ 0 = 0). The idea is the following. For moderate k, Var(U k ) = O(α 2 ), and thus it is this contribution to the noise which balances the drift term α 2 √ n n k=1 kv 2 k . Also, one may check that in the continuum limit the optimal v is such that |v k +v k+1 | = o(1), and so the Z and Y terms should "wash out".
We complete the argument in two steps. In step one, we simply drop the Z and Y terms and apply the method in Proposition 7 to the further simplified form Even here we loose a fair bit in our estimates (resulting in non-optimal on ε) due to the variable coefficient in the energy term.
Step two shows that, under yet additional restrictions on ε, the Z and Y noise terms may be absorbed into L(v, U).
for any 1-Lipschitz function F .
Indeed, by the general theory (see Thm. 5.2 of [13] for example) the distribution of the pair (χ, χ) on R + × R + satisfies a logarithmic Sobolev inequality. The lemma then applies with F (x, y) = x − y. In our setting, we record this bound as Picking up the thread of Proposition 7, the variable coefficient in the energy term of L(v, U) is dealt with by applying the Cauchy-Schwarz argument with λ = λ k defined by for our choice of integer m. The ∆ m (·, U) notation stands in analogy to that used in Section 3. Note we have taken the liberty to drop various constants and shifts of indices in the above display (which are irrelevant to the upshot).
Here the dependence of σ k and λ k on the relationship between n and κ comes into play. While at the top of the form everything works as anticipated, these quantities behave unfavorably for k near n. For this reason we deal with the sum (5.6) by dividing the range into j ≤ n/2m and j > n/2m with the help of the appraisals: σ 2 k ≤ Cα 2 , 1 ≤ k ≤ n/2, C, n/2 < k ≤ n. λ k ≥ √ n/C, 1 ≤ k ≤ n/2, α C √ n − k, n/2 < k < n. (5.7) Restricted to j ≤ n/2m (and hence substituting σ 2 jm = Cα 2 , λ jm = √ n/C), the sum (5.6) can be bounded by Ce −βnε 3/2 /C upon choosing m = [ε −1/2 α −2/3 ]. This holds for all values of ε so long as the choice of m is sensible, requiring that ε ≥ α −4/3 n −2 . But this is ensured if κ ≥ n + 1 and ε 3/2 n ≥ 1 (the former having been built into the hypotheses and the latter we may always assume). On the range j ≥ n/2m the ε term on the right hand side within the probabilities is of no help, and we use, along with σ 2 jm ≤ C and λ jm ≤ α √ n − jm/C, the crude estimates The choice of m = [ε −1/2 α −2/3 ] being fixed, we can bound each of the above by the desired Ce −βnε 3/2 /C only by restricting ε to be sufficiently small. The first estimate requires ε ≤ α 20/3 , the second requires in addition that ε ≤ α 8/3 n −2/5 (and again uses nε 3/2 ≥ 1). In summary P sup ||v||=1 L(v, U) ≥ α 4/3 √ nε ≤ Ce −βnε 3/2 /C if 0 < ε ≤ min(α 20/3 , α 8/3 n −2/5 ). (5.8) It is perhaps worth mentioning here that the bounds on λ k and σ 2 k for the range k ≥ n/2 introduced in (5.7) may be improved slightly, though not apparently with great effect on the final result.
Step 2. To absorb the Z, Y noise terms, we show that L ′ (v) ≤L(v, U) + E( Z, Y, v) with a new formL(v, U) comparable to L(v, U), and an "error" term E for which we have P(E ≥ α 4/3 √ nε) ≤ Ce −βnε 3/2 /C , at least for some range of ε > 0. What follows could almost certainly be improved upon.
(Recall the definition of λ k from (5.5).) Then, an application of the Cauchy-Schwarz inequality yields: for all v of length one, Obviously, the arguments of step 1 apply toL(v, U). Finally, with W k either Z k+1 or Y k , Lemmas 10 and 11 imply that P max 1≤k≤n−1 W 2 k βa k ≥ εα 4/3 √ n ≤ C n k=1 e −βεα 4/3 a k /C , provided say ε ≤ 1. Since it may be assumed that α < 1/2 (otherwise we are in the easy regime covered by Corollary 13), we have the bound a k = λ k ≤ √ n/C for k ≤ α 4 n ≤ n/2 and so also n k=1 e −εα 4/3 √ na k /C ≤ α 4 n e −βεα 4/3 n/C + C εα 10/3 e −βεα 22/3 n/C , by considering the sums over k ≤ α 4 n and k > α 4 n separately. If now ε ≤ α 44/3 (still keeping in mind that ε 3/2 n ≥ 1), the right hand side is less than Ce −βnε 3/2 /C . Adding this new constraint on ε to those stated in (5.8) completes the proof.