Universality of the least singular value for the sum of random matrices

We consider the least singular value of $M = R^* X T + U^* YV$, where $R,T,U,V$ are independent Haar-distributed unitary matrices and $X, Y$ are deterministic diagonal matrices. Under weak conditions on $X$ and $Y$, we show that the limiting distribution of the least singular value of $M$, suitably rescaled, is the same as the limiting distribution for the least singular value of a matrix of i.i.d. gaussian random variables. Our proof is based on the dynamical method used by Che and Landon to study the local spectral statistics of sums of Hermitian matrices.

A closely related question is to determine the limiting distribution of the least singular value, suitably rescaled, as the size of the matrix tends to infinity. For square matrices with independent entries, it is known that this distribution does not depend on the entry distributions and is equal to the one obtained from a matrix of i.i.d. gaussian random variables (which may be computed exactly). This phenomenon is known as universality of the least singular value and was proved for entry distributions with mean zero and variance one in [67] using ideas from the method of property testing in the study of algorithms. In [31], universality of the least singular value for square matrices was studied from a dynamical viewpoint and shown to hold for matrices whose entries may be sparse, weakly correlated, and have unequal variances. We also note that the case of genuinely rectangular matrices was taken up in [47].
In this work we prove universality of the least singular value for the matrix the sum of two generic random matrices and exhibits strong correlations between its entries, unlike the matrices studied in [31,67]. It was previously studied in [14], where its spectrum was controlled on scale N −1+ε . The Hermitian version of this model, H = V * XV +U * Y U , has attracted significant interest. The weak convergence of the empirical distribution was obtained first in [74] and later shown in [23,34,59,65] using alternative techniques. Convergence was then investigated on scales decaying in N in [51,52] and established on the optimal scale N −1+ε through the series of works [10][11][12][13]. The latter results were used to show universality of local spectral statistics in the bulk of the spectrum [30].
Our proof follows closely the method used in [30] to show universality for the Hermitian model. Two primary inputs in that work were a carefully chosen flow U (t) on the unitary group which leaves the eigenvalue distribution of H unchanged but produces a system of SDEs for the eigenvalue process closely resembling Dyson Brownian motion, and a weak local law throughout the spectrum (including the spectral edges) which was used as an a priori input to study the flow of the eigenvalues. We show how similar inputs may be obtained for the model M through a slightly more involved analysis, which proceeds by transforming the problem from one about the singular values of a N × N non-Hermitian matrix to the eigenvalues of a 2N × 2N Hermitian matrix. The resulting eigenvalue flow is not a Dyson Brownian motion, but instead similar to a symmetrized version studied in [31], and the short-time relaxation result for the symmetrized flow in that work is a crucial input here.
Compared to [30], we derive the weak local law in a slightly different way, involving a general stability analysis of the system of equations that define the free convolution of two measures. While the essential technical content is unchanged, this somewhat streamlines the proofs. Further, we use [14] to prove a strong law at small energies, paralleling the use of [10][11][12][13] in [30] to establish a strong law in the bulk of the spectrum. We also comment on an interesting difference between the real and complex cases which does not arise in the Hermitian model.
Acknowledgment. The authors thank Benjamin Landon for comments on a preliminary draft of this paper.

Overview and main result
2.1. Overview. In this section we define the model under consideration and state our main result. The main technical input is Theorem 7.6 about short time universality for the singular values of the model as it undergoes a time-dependent perturbation. Its proof occupies the bulk of this work. In Section 3 we define this perturbation and the associated stochastic differential equations governing the evolution of the singular values. In Sections 4 and 5 we prove various estimates necessary to study the short-time behavior of these SDEs. Their well-posedness and the fact that they represent the claimed singular value evolution are proved in Section 6. In Section 7 we compare the SDEs for the singular values to a symmetrized Dyson Brownian Motion flow. The short time behavior of this flow was studied in [31], and by combining our comparison with the main result of that work, we achieve a proof of Theorem 7.6. As corollary we deduce our main result, Theorem 2.1. Appendix A contains a computation using Itô's formula that is required for Section 6.
For concreteness, we focus on a model where deterministic initial data is conjugated by unitary matrices. It is natural to also consider the analogous model with conjugation by orthogonal matrices. Surprisingly, the SDEs for the evolution of the singular values in the second case lack a certain influential repulsion term compared to the first, and as a result the least singular value displays qualitatively different behavior. Fortunately, our methods suffice to treat this case too. The difference between the behavior of the least singular value in the real and complex models and the necessary modifications to the proof are discussed in Appendix B.
Finally, Appendix C contains some preliminary estimates required for our analysis.
where X = diag(x 1 , . . . , x N ) and Y = diag(y 1 , . . . , y N ) are deterministic diagonal matrices and R, T, U, V are independent and distributed according to the Haar measure on the unitary group U (N ). We suppose that for some constant C 0 independent of N . Denote the empirical measures of X and Y by For integers 1 ≤ i ≤ N , we define We denote the symmetrized version of the empirical measures of X and Y by 1 For z ∈ C + , define the Stietjes transforms We assume: (2) There are compactly supported probability measures µ 1 , µ 2 such that µ X → µ 1 and µ Y → µ 2 weakly, and at least one of µ 1 , µ 2 has a bounded Stieltjes transform. 2 (3) Neither of µ sym 1 , µ sym 2 is a single point mass, and at least one is supported at more than two points. (4) The Stieltjes transform of the measure µ X converges to that of µ 1 with polynomial speed, in the sense that there exists a constant c X > 0 such that for η ≥ N −c X . (5) The particle y k is close to the deterministic location y * k in the sense that for any c > 0, where y * k is the k-th N -quantile of µ 2 defined by (2.10) (6) The measure µ 2 has a continuous density and there are constants c, δ 0 > 0 such that for any x ∈ supp µ 2 and 0 ≤ h ≤ δ 2 , has a density ρ(x) in a neighborhood of zero such that for some constants C, c > 0 and all x ∈ [−c, c].
The first assumption is to prevent the y i from accumulating around any point E ∈ R. This is illustrated by Proposition C.1. The second and third are required to apply [14,Theorem 4.4] to control the Green functions of (a modification of) M , as is done in Section 5. The remaining assumptions are required in Subsection 5.2 and Subsection 5.3. 4 The condition (6) is technical and says, roughly, that the spectral edges of µ 2 are not pathological.
The assumption 7 is difficult to check in general. For example, the case where µ 1 is a point mass was considered in [14,Theorem 2.2], whose proof is quite technical. In Appendix C we prove two simple sufficient conditions for 7: both µ sym 1 and µ sym 2 have positive density at zero, or µ sym 1 = µ sym 2 .

Main result.
The following is our main result. It is proved at the end of Section 7.
where c > 0 is an absolute constant uniform in r.

Definition of dynamics
3.1. Unitary Brownian motion. We use the following definitions. Recall a standard complex gaussian random variable is such that its real and imaginary parts are independent mean zero normal distributions with variance 1/2.
) are independent real standard Brownian motions.
are a collection of independent real standard Brownian motions, and B ij = B ji .
Given a parameter a ∈ (0, 1) we introduce the index set and let I c a be the set of pairs (i, j) not in I a . We let U (0) := 0 and V (0) := V evolve according to the following equations.
Here dW 1 , dW 2 , and A are defined as follows. Let W 1 and W 2 be independent Hermitian Brownian motions in the sense of Definition 3.2. For 1 ≤ i, j ≤ N , define the matrix processes W 1 and W 2 entrywise by The diagonal matrix A in (3.2) is given by (3.5) Let us explain why these choices are made. With this definition of W 1 and W 2 , we see that We therefore find by the Lévy criterion that ∈Ia is a family of independent standard complex Brownian motions. In particular, there is no longer a Hermitian symmetry. We write where (d B ij ) 1≤i,j≤N is a family of independent standard complex Brownian motions. We choose A so that the solutions U (t) and V (t) stay on the unitary group. One can verify this by differentiating U U * using Itô's formula and the above definitions to see that d(U U * ) is constant.
Having defined U (t) and V (t), we can differentiate M and use (3.7) to see where A is a diagonal matrix whose entries are given by 3.2. Canceling mesoscopic drift. It is hard to use (3.8) as written because after time T = N −1+b , the contribution from the second term will be order N −1+b , which is larger than the order of the microscopic statistic we are interested in. Therefore we introduce an auxiliary matrix The process M (t) has the property that M (T ) = M (T ) and Here B is a matrix of standard complex Brownian motions. We show in Section 7 that the second term, when integrated from 0 to T , is o(N −1 ). This is small enough not to disturb the microscopic scale O(N −1 ). Formally applying Itô's formula (see Appendix A for details) suggests that the evolution of the eigenvalues of 0 M M * 0 is governed by the following system of SDEs: where 14) and for i < 0 we set R i = −R −i and γ ij = −γ −i,j . Here j i and k i are the columns of the matrices J and K in the singular value decomposition M = JSK * with S diagonal. In Section 6, we justify this formal calculation, proving that the SDE (3.13) is well-posed and its solution is the eigenvalue process for M (t).

Local law
In this section we prove a local law that is used in the next section to obtain global control on the quantities w α (i), z α (i), and γ αβ . We fix constants a, b such that and let T = N −1+b denote the short time we study. Any constant C without further specification is a universal constant that may depend on a but not on N . It may change from line to line, but only finitely times, so that it remains finite. The norm · on matrices denotes the operator norm as an operator 2 → 2 .
We define and G = (H − z) −1 . Note that the eigenvalues of H are exactly the singular values of M . We also define and G = ( H − z) −1 .

4.1.
Concentration of Green's functions. The main probabilistic tool in this section is the following concentration result about the Haar measure on the unitary group U (N ). We use the following notation for the Hilbert-Schmidt norm of matrices: We also recall the equivalent characterization of this norm in terms of the sum of the squares of the matrix entries: The next proposition follows from a theorem by Gromov; see [9,Corollary 4.4.28].
Let P be the normalized Haar measure on the unitary group U (N ) and E be the corresponding expectation. Then there is a constant c > 0 not depending on N such that The above proposition can be applied to the Green's function G, which is a smooth function of U , V , and z. In particular, the Lipschitz constant of G with respect to the variable U or V can be bounded using the imaginary part of z, as illustrated by the following propositions.
The same bounds hold for G.
Proof. Let e i ∈ C N be the unit vector with 1 on the i-th coordinate and 0 otherwise. Then we have By the definition of G we have e i ≤ H − z Ge i and note that H − z ≤ C + |z|. Then (4.11) This proves the first inequality. The second inequality follows from the spectral theorem.
Proof. Let G be the Green's function G after replacing U with U ∈ U (N ). Then the resolvent identity yields Therefore, using the general inequality AB HS ≤ A B HS and the symmetry of the matrices in question, By Proposition 4.2, G ≤ η −1 , and similarly the spectral theorem yields G ≤ η −1 . By 14) and the conclusion follows.

Invariant identities.
Let E ij be the matrix whose (i, j)-th entry is 1 and all the other entries are 0: 15) The matrix E ij will be either N by N or 2N by 2N , depending on the context. For brevity we set Y = Y + (T − t) A, and we define We require the following lemma.
Proof. For any ζ ∈ C, define an N × N unitary matrix Q(ζ) by The derivatives of Q(ζ) and Q * (ζ) with respect to ζ at ζ = 0 are We differentiate G(ζ, z) and evaluate the derivative at ζ = 0 to obtain Note that the distribution of Q(ζ)U is invariant with respect to ζ, so E[∂ ζ G(ζ, z)] = 0. Therefore, the above equality yields This can also be rearranged as This proves the case where N + 1 ≤ i, j ≤ 2N . The other case follows from a similar argument after multiplying Q(ζ) on the right.

Asymptotic equations.
Let m N (z) be the Sieltjes transform of H. In this subsection we provide a system of equations that m N (z) satisfies asymptotically. We require the following high probability notation.
Definition 4.5. Given two sequences of random variables ( In general, for some index set A (possibly N -dependent) and families of random variables (X(α, N )) and (Y (α, N )) with parameters α ∈ A and N ∈ N, we say that which are holomorphic functions on C + .
Lemma 4.6. For any z = E + iη ∈ C + with |z| ≤ log N and a > 0, we have Proof. By Lemma 4.4 and Proposition 4.1, for 1 ≤ i, j ≤ N or N + 1 ≤ i, j ≤ 2N , and any a > 0, 0 ≤ k, l ≤ 2N , we have, using the general inequality AB HS ≤ A B HS and the symmetry of the matrices in question, Take i, j ∈ [1, N ] and let l = j, then sum over j: Note that U * R * XT V has the same distribution as (U * R * XT V ) * , so 0 I I 0 G 0 I I 0 has the same probability distribution as G. It follows that G ii has the same distribution as G n+i,n+i for 1 ≤ i ≤ N . Therefore, Then Proposition 4.1 and Proposition 4.3 imply that the above identity holds without expectation up to a small error term: Recall the definition (4.25). Take the quotient of the above two equations and use Proposition Now we go back to (4.28), plugging in the above equation to see This proves the conclusion for 1 ≤ i ≤ N . Similarly, we can take i ∈ [N + 1, 2N ] to obtain the same identity for This proves the conclusion for fixed i and k.
The constants in the O notation are uniform in i and k.
Now we are ready to prove that the three holomorphic functions m N , w X , and w Y approximately satisfy a system of equations for z ∈ C + . Lemma 4.7. For any z ∈ C + with |z| ≤ log N and a > 0, we have

34)
and for 1 ≤ k ≤ N , Proof. We start with the following identity, which is equivalent to the definition of G.
Taking the (k, k)-th entry of each of the four blocks, we have 37) or equivalently, We apply Corollary 4.6 to the matrix involving GH 1 to get where the O notation is used entrywise. Using Proposition 4.8 and the fact that which can be written explicitly as (4.41) where we omit the off-diagonal terms and write them as * . We sum over the diagonal terms to get which follows from the definition of G.
The following proposition was used in the proof of Lemma 4.34. Then, Here O(·) is in the sense of operator norm.
Proof. We immediately have B = A −1 (I − AR). Hence By the assumption that AR < 1/2, we have On the other hand, A(B + R) = I implies 4.4. Weak law. In this subsection we prove a weak law for the G ii . We use the term weak because the result is only valid in the regime Im z ≥ N −c for some small constant c. Thus, it is only slightly stronger that the weak convergence of the corresponding measure µ N . Nevertheless, the weak law provides necessary bounds for the eigenvectors of H.
We deal with equation (4.34) in a general setting. Let m α , m β be the Stieltjes transforms of probability measures µ α , µ β . Consider the following deterministic equations for fixed z ∈ C + .
Observe that equation (4.34) is a special case of the above equations plus some error terms. The existence and uniqueness of a solution to this system is known. We call the measure µ α µ β given by the following proposition the free convolution of µ α and µ β .
Proposition 4.9. Given two probability measures µ α and µ β on R, there exists unique analytic functions w α , w β , m : C + → C + satisfying (4.50), where m is the Stietljes transform of a probability measure we denote µ α µ β . If µ α , µ β are compactly supported, and none of them is a point mass, then w α , w β extend continuously to R. If in addition, µ α ({a}) + µ β ({b}) < 1 for all a, b ∈ R, then µ α µ β has a continuous density. We need the stability of the solution under perturbation. To investigate this, it is convenient to write the equation in a more symmetric form. In fact, the above equation can be rephrased in terms of w α and w β only. Definê . (4.51) Proposition 2.2 in [58] says thatm α andm β are Stieltjes transforms of Borel measuresμ α and µ β on R, whose total masses are σ 2 α := t 2 µ α (dt) and σ 2 β := t 2 µ β (dt) respectively (which are ≤ C by assumption (2.2)).

(4.67)
There exists c > 0 such that if δw ≤ c(η 3 ∧ η 7 ), then Proof. It is easy to see that for fixed z, By Taylor expansion at (w α , w β ), we have Using the bound (4.56) we have (4.72) Using the condition that δw ≤ c(η 3 ∧ η 7 ), the second term on the right side can be absorbed into the left side. Thus, Define the spectral domain and letw X andw Y solve w X =m X (z +w Y ) w Y =m Y (z +w X ).
Proof. We now restrict to z ∈ Σ. Multiplying the first and third equations of (4.34) gives Analogously,m The claim now follows from choosing a = 1/6 and applying Proposition 4.11.
Corollary 4.13. There exists a constant c(b) > 0 such that with probability at least 1 − e −cN c , where the i in Y i is taken modulo N .

Deterministic estimates.
Let m 1 , m 2 be the Stieltjes transforms of µ sym 1 , µ sym 2 , and let w 1 , w 2 be the solution to the system The proof of the following proposition is the same as [30, Proposition 3.9].
Proposition 5.2. There exists p > 0 such that if Im z ≥ N −1/p , then for all z.
Given p > 0, define the spectral domain Proof. This follows from Proposition 4.11 with We indicate how the bound the first coordinate; the second is analogous. From Corollary 5.3, For any Stieltjes transform m(z) of a measure µ, Im m(z) ≥ Im z (|z| + sup x∈supp µ |x|) 2 . (5.12) Taking p large and using |w X | ∨ |w Y | ≤ cη −1 , this shows In the last inequality we used the hypothesis that η ≥ N −1/p 2 . Finally, the second bound follows from the first and Corollary 5.3.
Definem(z) to be the Stieltjes transform of µ sym X µ sym Y . Corollary 5.4. Under the same assumptions as Corollary 5.3, m(z) ≤ C. (5.14) Proof. As noted in Section 2.2, we assume that either µ 1 or µ 2 has a bounded Stieltjes transform. By Proposition 5.2, for η ≥ N −1/p , Using the definition ofm (recall (4.50)), this completes the proof.
The following corollary is essentially the same as [30, Theorem 3.14]. We include it for completeness.

Bulk eigenvector bounds.
Theorem 5.6. Let I be the interval in assumption 7. Fix ν > 0 and set

25)
and it holds with overwhelming probability that where the indices in y i and A ii are taken modulo N .
Proof. By assumption, the empirical measure of U * R * XT V converges to µ 1 weakly. Because T = o(1) and by (C.10), (T − t) A is negligible and Y + (T − t) A converges to µ 2 weakly. Fix a small σ > 0. Then by Theorem 4.4 of [14], for any fixed t, with overwhelming probability for sufficiently large N not depending on t. Further, by Lemma A.2 of [14], there exists c > 0 such that inf z∈D I Imw X ≥ c for large enough N , independent of t. This implies the desired claim for G(z, t) at any fixed t. Observe there is an implicit dependence ofw X on t. This estimate may then be transferred to G ii , using the resolvent identity, and made uniform in t, using a standard stochastic continuity argument, as in [30,Theorem 3.16]. This completes the proof.
Corollary 5.7. Let I be the interval in assumption 7. Then for any ν > 0, the following estimates hold with overwhelming probability: Proof. The proof is the same as [30,Corollary 3.17] after observing that the eigenvectors of H are of the form (w α , z α ) and (−w α , z α ).
Theorem 5.8. Let I and D I be as in Theorem 5.6. There exists a constant q > 0 such that Proof. Let q > 0 be a constant to be determined later. We define Σ 1 ⊂ D I by Im w 2 ≥ c Im w 1 , Im w 1 ≥ c Im w 2 , Im w 1 + Im w 2 ≥ c > 0, (5.31) which implies Im w 1 ∧ Im w 2 ≥ c > 0. These lower bounds permit the use of Proposition 5.2 to conclude that on Σ 1 , We now claim that on D I the stability of the system of equations (5.6) is improved, so that the operator Φ from (4.53) satisfies (DΦ(w α , w β )) −1 ≤ C. (5.33) To see this, one can reinspect the proof of Proposition 4.10 using the bound Im w 1 ∧ Im w 2 ≥ c > 0, which impliesp ∨q ≤ C, and the bound sup Im z≥0 |pq| < sup Im z≥0 |pq| ≤ 1, (5.34) which holds becauseμ 2 is not a point mass and Im w 1 ∧ Im w 2 ≥ c > 0. Because p, q are continuous, we find |1 − pq| ≥ c > 0 on I. Repeating (4.61) and (4.62) with these improved bounds proves the claim. Similar reasoning gives DΦ ∞ < C, D 2 Φ ∞ ≤ C on Σ 1 . We can therefore repeat the reasoning of the proof of Proposition 4.11 to show that for any z ∈ Σ 1 , there is a neighborhood of z such that |w 1 −w X | ∨ |w 2 −w Y | ≤ N −1/q when q > p. This shows that Σ 1 = D I . Finally, on D I we have using (5.6), the lower bound on Im w 2 , and Proposition 5.2, This completes the proof.
Corollary 5.9. Let I and D I be as in Theorem 5.6. There exists a constant p > 0 such that, with overwhelming probability, Proof. The claim follows from combining Theorem 5.6, recallingm(z) = m Y (z +w X ) from (4.75), and Theorem 5.8.

Well-posedness of dynamics
To show the well-posedness of (3.13), it is important to ensure that the drift term, which depends on the the inverses of the eigenvalue spacings, does not become too singular. We guarantee this by adding a small perturbation to X, which was defined in (2.1). Let where Q is an a N × N matrix of i.i.d. standard complex gaussians. We first note that because the perturbation is exponentially small, it does not affect our desired conclusion. The proof is trivial and hence omitted. For the rest of this work, we use the redefined version of M with X and may not explicitly indicate this. We now prove the desired eigenvalue repulsion estimates. The proof of the following lemma is similar to [30,Proposition 2.3]. For completeness we provide some details in the current context. Lemma 6.2. Let P be a N × N matrix of complex numbers, and let Q be a N × N matrix of i.i.d. standard complex gaussians. Define the 2N × 2N matrix P by Let γ 1 ≤ · · · ≤ γ N be the eigenvalues of P and α 1 ≤ · · · ≤ α N be the positive eigenvalues of P . Let α −i = −α i denote the corresponding negative eigenvalues. Then the α i are almost surely distinct, and we have the following estimates for every δ ∈ (0, 1): where c N is an N -dependent constant and Finally, we have Proof. Recall that P has a singular value decomposition P = U SV * , where S is diagonal and U and V are unitary. Therefore, after conjugating by the unitary block matrix 1 which leaves invariant the eigenvalues and the distribution of Q, we may suppose P is real and diagonal. Define the index set corresponding to the off-diagonal blocks by Let H N be set of 2N × 2N Hermitian matrices with zeros in the indices J c (the diagonal N × N blocks). We parameterize H N by the coordinates (w ij ) ∈ R 2N ×2N , where w ij = 0 if (i, j) ∈ J c and h ij = w ij + iw ji for j > i otherwise. This space is naturally equipped with the Lebesgue measure for R 2N 2 . Set σ N = e −N and write the density for P by where we use that P is real, so only the w ij representing the real parts of the diagonals of the off-diagonal blocks are shifted. Note the normalization constant Z N does not depend on P .
In the eigenvalue-eigenvector coordinates, 5 we have from the singular value decomposition for the upper-right block of the (w ij ) matrix. Here g(u, v) is an integrable function on the compact subdomain of C N (N −1/2) × C N (N −1/2) where the map (u, v) → (U (u), V (v)) taking the strictly upper triangular part of a matrix to the full Hermitian matrix is well-defined.
Using the trivial bound of 1 on the eigenvectors on the AM-GM inequality, we obtain This implies Then integrating out the g(u, v) term and integrating again to compute E i =j |α i − α j | −1 , we obtain the first bound. The final inequality follows as in [30,Proposition 2.3].
With this estimate, the following well-posedness theorem is proved nearly identically to [30,Theorem 5.2]. For any t > 0 we define the filtration where B s is the multi-dimensional Brownian motion driving (3.13). • λ(t) is adapted to the filtration (F t ) 0≤t≤t , and for almost all t ∈ [0, t]] = 1.

Analysis of SDEs
The system of SDEs for the singular value evolution is for i ≥ 1. We recall that with λ i and λ i are coupled as discussed above so that λ i = −λ −i (and the remainder terms and the γ ij are coupled in the same way). We use the redefinition noted in Lemma 6.1, so that our well-posedness result Theorem 6.3 applies.
Our plan is to study this system for times 0 ≤ t ≤ T with T = N −1+b and compare it to the process defined by which we treat using the methods of [31]. We follow closely the strategy in [30], commenting on the minor differences in the current setting.
7.1. Interpolating process. For 0 ≤ α ≤ 1 we define the interpolating process z i (t, α) by the SDE and let m t (z) be the free convolution of m 0 with the semicircle law at time t (see [54] for details): for i ∈ J with overwhelming probability. The function m t (z) is the Stieltjes transform of some probability density ρ t (E). Let the classical eigenvalue locations of the free convolution ρ t be {γ i (t)} N |i|=1 . Note that by the same reasoning given in [30,Section 4.4], that for any ν > 0 and i, j ∈ J with |i − j| ≥ N ν , The following rigidity lemmas hold. They are straightforward adaptations of the proofs of Theorem 3.1 and Corollary 3.1 of [49], and the discussion in [30,Section 4.5]. The main difference is that our Brownian motions are coupled in pairs, B i = −B −i . However, this does not affect the bound on the Brownian motion terms in equation (3.33) of [49] in the proof the deformed law, so the same method applies here. Observe our global eigenvector bounds from Theorem 5.1 are used to prove the second lemma.
Conclusion. The remaining stochastic analysis, including a short-range approximation and use of the energy method, is virtually identical to the argument given in [30], and we obtain the following coupling.
Proposition 7.3. Fix κ > 0. Suppose that b < a/100 and a < c/10. For every time t such that 0 ≤ t ≤ T , we have with overwhelming probability and every index i ∈ J that The following is essentially [31,Theorem 3.2]. 6 Compared to that reference, a certain repulsion term is present in the dynamics we study here (cf. Appendix B), but the proof is nearly identical (and in fact strictly easier) in our case.
We first recall the setup from that reference. Fix δ 1 > 0 and let g and G be N -dependent parameters such that (7.14) Let V be a deterministic matrix and let B t = {B ij (t)} 1≤i,j≤N be a matrix of i.i.d. standard complex Brownian motions. Define Let {s i (t)} N i=−N (omitting the zero index) be the eigenvalues of H t . We set where again i = 0 is omitted in the sum.
Definition 7.4. With g and G as above, we say V is (g, G)-regular if c ≤ Im m V (E + iη) ≤ C (7.17) for |E| ≤ G and η ∈ [g, 10] for large enough N , and if there exists a constant C V such that Let W be a random matrix whose entries are i.i.d. complex normal variables of variance N −1 , and letB t = {B ij (t)} 1≤i,j≤N be a matrix of i.i.d. standard complex Brownian motions.
Combining this result with Proposition 7.3, we obtain short time relaxation of the singular value dynamics.
Theorem 7.6. Fix σ > 0, κ > 0, suppose that b < a/100 and a < c/10, and retain the definitions of Proposition 7.3. Then there exists a coupling of the processes {λ i (t)} and {r i (t)} such that We are now positioned to prove our main theorem. The distribution of the least singular value of a Gaussian matrix is known explicitly. For W and any r ≥ 0 [37], We now show the √ 1 + T factor is negligible, so that we may compare λ 1 (M N ) directly to λ 1 (W ). We compute, using 1 − e −x ≤ x, (7.26) By (7.22), N λ 1 (W ) has a bounded density, so the N −δ terms in the above may be removed with O(N −c ) error, and we conclude that as desired.
Appendix A. Derivation of dynamics The following is a formal calculation that ignores the technical issue of possible eigenvalue collisions. It is used in Section 6, where this issue is dealt with rigorously.
A.1. Calculation. With M as above, we define the 2N × 2N block matrix Observe that the eigenvalues of X are the singular values of M and their negatives. Let M = JSK * be the singular value decomposition of M . Then a matrix of normalized eigenvectors for X is We follow the approach of [41,Chapter 12] to compute the dynamics of the eigenvalues of X.
Denote the eigenvalues of X by λ α with corresponding eigenvectors u α . We have and by the chain rule, Itô's formula gives The first term is, for α ≤ N , We see that where {dB α } N α=1 is a set of independent standard real Brownian motions. The independence follows from an explicit computation, noting that (d B ij )(d B kl ) = δ il δ jk . The remaining terms are The second term is The first contribution is This vanishes unless i = l, j = k, and exactly one of i or j is greater than N , due to the covariation factor. Summing over i and j, we obtain the norm of the first or last half of each u α , that is j α 2 /2 or k α 2 /2, both of which are 1/2. We then recover the drift term The remaining contribution is We perform the sum on i and j first. We have Define the column vectors w α = U j α , z α = V k α , (A. 18) and set R = (1 (i,j)∈I c a d B ij ). Then since the quadratic variation of a standard complex Brownian motion is zero, and the elements of R are independent, For i < 0 one can check that R i = −R −i and γ ij = −γ −i,j .

Appendix B. Real case
We now consider the real analogue of the model of Section 2.2 when the initial data is conjugated by orthogonal matrices. Precisely, in this section we consider the matrix ensemble where X = diag(x 1 , . . . , x N ) and Y = diag(y 1 , . . . , y N ) are deterministic diagonal matrices and R, T, U, V are independent and distributed according to the Haar measure on the orthogonal group O(N ). We retain the hypotheses from Section 2.2. The least singular value in the real case displays qualitatively different behavior than its counterpart in the complex case, as indicated by the accompanying simulation results. The density for λ 1 vanishes at zero in the complex model, but remains positive in the real model. The singular value distribution in the real case is said to have a hard edge at zero. This phenomenon may be understood dynamically. As discussed in Appendix A, the drift term in the complex case has the repulsion component 1 2N The same computation in the real case yields the repulsion term 1 2N with the interaction between λ i and λ −i removed. For λ 1 , this means there is no force from λ −i pushing it away from the origin, resulting in the hard edge. The model (B.1) can be handled by the same method used for (2.1). The definition of the matrix dynamics in Section 3 is the same except for obvious changes, such as the use of orthogonal matrices and real symmetric Brownian motions. This leads to virtually the same singular value dynamics as in Appendix A, with the important exception of the interaction term noted above. The estimates of Section C are also essentially unchanged. An inspection of the proofs referenced in Section 6 and Section 7 shows that they still apply to the dynamics in the real case. An important point is that the short time universality result Proposition 7.5 still holds without the regularizing force from λ −1 ; this was the original form of the result stated in [31]. Finally, for the exact form of the distribution of the least singular value for the gaussian matrix, we use the form with quantitative error given in [67].
We obtain the following analogue of Theorem 2.1 for the real model. where the sums are taken over indices i such that 1 ≤ |i| ≤ N .
Proof. Let η = N −1+a . Note that 1 2N Divide both sides by η to obtain 1 2N This proves the first inequality in the proposition. For the second inequality, note that for x > η, we have 1 x ≤ 2 x + 1 + where c N > 0 depends on N . 7 We recall that equation (3.7) in this reference is derived by applying the formula d dt e θX(t) = θ 0 e αX(t) dx(t) dt e (θ−α)X(t) dt, which holds for any one-parameter matrix subgroup X(t) [75], to compute the derivative of the matrix exponential with respect to each matrix entry, in conjunction with Itô's formula. C.3. Sufficient conditions for positive density. The next lemma follows from the argument in [13,Lemma 3.2]. We provide the reasoning again here for completeness.
Lemma C.4. Let µ α , µ β be probability measures with density functions ρ α , ρ β that are symmetric about zero and are strictly positive on [−r 0 , r 0 ] for some r 0 > 0. Then µ α µ β has a density, and that density is bounded above and away from zero in a neighborhood of zero.
Proof. According to [20,Corollary 8], µ α µ β has a bounded density. It remains to show it is bounded away from zero. By Proposition 4.9, the corresponding subordination functions w α , w β extend continuously to 0 with values in C + ∪ R ∪ {∞}. By the equations defining the free convolution, it suffices to show these limits are not infinite to show that the density µ α µ β is bounded below in a neighborhood of 0.
We proceed by contradiction. Fix r < r 0 /2 and define E = z ∈ C + ∪ R : |z| ≤ r . (C.14) Let L > r 0 and M > 10 be large parameters to be fixed later. We first suppose that there exists z ∈ E such that |w α (z)| > LM and |w β (z)| > L. The defining equations for the free convolution give where the O notation is with respect to the limit L → ∞. The above equation gives w β w α = O(w −1 α ). (C. 16) This contradicts L/|w α | ≤ |w β /w α | (which holds by our assumptions on w α , w β ) for L sufficiently large. We next suppose |w α (z)| > LM and |w β (z)| ≤ L, and find from the definition of free convolution that for z ∈ E and M sufficiently large, 1 |m α (w β )| = |w α + w β − z| ≥ M L 2 . (C.17) By symmetry of µ α and µ β we know that w β is imaginary for z on the imaginary line {iη|η ∈ R}. But m α (z) has no zeros on the imaginary line, as ρ α is positive near 0. So it is bounded away from zero in z ∈ E. For M large we reach a contradiction. This completes the proof.
In the case µ α = µ β , only the first part of the previous argument is required.
Lemma C.5. Let µ α be a symmetric probability measure, not necessarily absolutely continuous, supported at more than 2 points. Then µ α µ α has a density, and that density is bounded above and away from zero in a neighborhood of zero.