ON THE LOCAL PAIRING BEHAVIOR OF CRITICAL POINTS AND ROOTS OF RANDOM POLYNOMIALS

We study the pairing between zeros and critical points of the polynomial pn(z) = ∏n j=1(z − Xj), whose roots X1, . . . , Xn are complex-valued random variables. Under a regularity assumption, we show that if the roots are independent and identically distributed, the Wasserstein distance between the empirical distributions of roots and critical points of pn is on the order of 1/n, up to logarithmic corrections. The proof relies on a careful construction of disjoint random Jordan curves in the complex plane, which allow us to naturally pair roots and nearby critical points. In addition, we establish asymptotic expansions to order 1/n for the locations of the nearest critical points to several fixed roots Xj . This allows us to describe the joint limiting fluctuations of the critical points as n tends to infinity, extending a recent result of Kabluchko and Seidel. Finally, we present a local law that describes the mesoscopic behavior of the critical points when the roots are neither independent nor identically distributed.


Introduction
This paper concerns the nature of the pairing between the critical points and roots of random polynomials in a single complex variable. In particular, we consider polynomials of the form where X 1 , . . . , X n are complex-valued random variables (not necessarily independent or identically distributed). While much is known about the locations of the critical points of p n when the roots are deterministic (see for example Marden's book [21] which contains the Gauss-Lucas theorem and Walsh's two circle theorem among other results), Pemantle and Rivin [25], Hanin [12,13,14], and Kabluchko [17] first demonstrated that the random version of this problem admits greater precision, especially when the degree, n, is large. In particular, Pemantle and Rivin conjectured that when X 1 , . . . , X n are chosen to be independent and identically distributed (iid) with distribution µ, then the empirical distribution constructed from the critical points pf p n converges weakly in probability to µ. They proved their conjecture in [25] for measures satisfying some technical assumptions, and Subramanian [29] refined their work for {X j } n j=1 on the unit circle. Kabluchko first proved the conjecture in full generality in [17] to obtain the following result.
S. O'Rourke has been supported in part by NSF grants ECCS-1610003 and DMS-1810500. 1 Theorem 1.1 (Kabluchko [17]). Let X 1 , X 2 , . . . be iid complex-valued random variables with distribution µ. Then for any bounded and continuous function ϕ : C → C, in probability as n → ∞, where w Inspired by such results, the first author established several versions of Theorem 1.1 for random polynomials with dependent roots that satisfy some technical conditions [22,24]. For example, the conclusion of Theorem 1.1 holds for characteristic polynomials of certain classes of matrices from the classical compact matrix groups. Additionally, in [23], the authors adapted Kabluchko's strategy to the situation where p n is perturbed to have o(n) deterministic roots. Two other relevant works include Reddy's thesis [28] and the recent paper of Byun, Lee, and Reddy [3], who showed that under some mild assumptions, Kabluchko's result holds when p n has mostly deterministic roots and several (potentially dependent) random ones. Byun, Lee, and Reddy also prove several other results including that the sequence of empirical measures constructed from the zeros of p (k) n converges weakly in probability to the distribution µ, for any fixed choice of k, as well as a version of Theorem 1.1 when the roots X 1 , . . . , X n are given by a 2D Coulomb gas density. Theorem 1.1 and most of the cited works above focus on the macroscopic, or global, behavior of the critical points of p n . For example, by combining Theorem 1.1 with the Law of Large Numbers, one obtains that, for any bounded and continuous function ϕ : C → C, with high probability 1 . If ϕ is chosen to approximate an indicator function of a Borel set B, Theorem 1.1 implies that the number of roots of p n in B is roughly the same as the number of critical points in B, up to o(n) corrections. Since the error term, o(n), is so large, this conclusion is typically only useful when the number of roots in B is proportional to n. Such a result does not allow one to study the behavior of a single critical point or even, say, log n critical points. We use the term macroscopic behavior to describe results, such as Theorem 1.1, where the focus is on a proportional-to-n number of critical points. In contrast to Theorem 1.1, this paper focuses on describing the microscopic and mesoscopic behavior of the critical points. We use "microscopic" to denote situations that concern a single critical point (or a fixed number of critical points) and "mesoscopic" to refer to cases that are not in either the microscopic or macroscopic regimes. For instance, the mesoscopic regime includes cases where the Borel set B from above, with high probability, contains a number of critical points that is on the order of log n.
1 See Section 1.1 for a complete description of the asymptotic notation used here and in the sequel.  One important aspect of critical-point behavior at the microscopic and mesoscopic scales is that the critical points and roots of p n appear to pair with one another. Theorem 1.1 and (2) describe this phenomenon at the macroscopic level by comparing the global behaviors of the critical points and roots. However, a glance at Figures 1 and 2 suggests, among other things, that a stronger pairing phenomenon exists. In particular, one sees that nearly every critical point is paired closely with a root of p n , an indication that the microscopic and mesoscopic behavior of the critical points should be extremely similar to the microscopic and mesoscopic behavior of the roots.

LOCAL BEHAVIOR OF CRITICAL POINTS AND ROOTS OF RANDOM POLYNOMIALS 3
Hanin investigated the pairing phenomenon between roots and critical points for several classes of random functions [12,13,14], including random polynomials with independent roots. He proved that the distance between a fixed, deterministic root and its nearest critical point is roughly 1/n in the case where µ has a bounded density supported on the Riemann sphere [14]. The authors also explored rootand-critical point pairing for polynomials in [23], and Dennis and Hannay gave an electrostatic explanation of the phenomenon in [7]. Recently, Kabluchko and Seidel determined the asymptotic fluctuations of the critical point of p n that is nearest a given root [18]. Their results are similar to some of our conclusions below and appear to have been concurrently derived using different methods. We present a detailed comparison between [18] and our work in the next section.
In this paper, we refine the results mentioned above to obtain a more complete picture of the pairing that occurs between zeros and critical points of p n . We begin by exhibiting a bound on the Wasserstein, or "transport," distance between the collections of roots and critical points of p n . While this result explains the nearly 1-1 pairing between roots and critical points in Figures 1 and 2, it does not allow one to "zoom in" on any particular root. We accomplish this feat in the next section of the paper, where we discuss the microscopic locations and joint fluctuations for a fixed number of critical points of p n . We conclude our analysis by establishing a local law that describes the mesoscopic behavior of the critical points of p n . Many of our results focus on the cases where the roots X 1 , . . . , X n of p n are iid, but in the most general setting, we do not even require that the roots be independent.
1.1. Notation. Throughout the paper, we use asymptotic notation (such as O, o,) under the assumption that n → ∞. We write X n = O(Y n ) , Y n = Ω(X n ), X n ≪ Y n , or Y n ≫ X n to denote the bound |X n | ≤ CY n for some constant C > 0 and for all n > C. If the implicit constant depends on a parameter k, e.g., C = C k , we denote this with subscripts, e.g., X n = O k (Y n ) or X n ≪ k Y n . By X n = o k (Y n ), we mean that for any ε > 0, there is a natural number N ε,k depending on k and ε for which n ≥ N ε,k implies |X n | ≤ εY n . In general, C, c, K are constants which may change from one occurrence to the next. We often use subscripts, such as C P1,P2,... , to denote that the constant depends on some parameters P 1 , P 2 , . . ..
We use the following set-theoretic conventions. For z 0 ∈ C and r ≥ 0, we define B(z 0 , r) := {z ∈ C : |z − z 0 | < r} to be the open ball of radius r centered at z 0 , and B(z 0 , r) to be its closure. The notations #S and |S| denote the cardinality of the finite set S. The natural numbers, N, do not include zero. For a probability measure µ, we use X ∼ µ to mean that the random variable X has distribution µ and supp(µ) to denote its support. We say that a probability measure µ on C has density f if µ is absolutely continuous with respect to Lebesgue measure on C and the Radon-Nikodym derivative of µ with respect to Lebesgue measure is f . The random variable ½ E is the indicator supported on the event E, and we say an event E (which depends on n) holds with overwhelming probability if for every α > 0, P(E) ≥ 1 − O α (n −α ).
Finally, we use d 2 z to denote integration with respect to the Lebesgue measure on C to avoid confusion with complex line integrals, where we integrate against dz. We use √ −1 to denote the imaginary unit and reserve i as an index.
Acknowledgements. The authors thank Boris Hanin for calling their attention to this line of research and for many useful conversations.

Main Results
We now present our main results that describe the microscopic and mesoscopic behavior of the critical points of p n . We begin by introducing the Wasserstein metric in order to discuss the pairing between the roots and critical points of p n that one sees in Figures 1 and 2. 2.1. Wasserstein distance. For two probability measures µ and ν on C, we let W 1 (µ, ν) be the L 1 -Wasserstein distance between µ and ν defined by where the infimum is over all probability measures π on C × C with marginals µ and ν (see e.g. [36], Chapter 6). Theorem 2.3 below gives a bound on the Wasserstein distance between the empirical measures constructed from the roots and the critical points of the polynomial p n defined in (1). Before we state the theorem, we mention some notation and assumptions. For any probability measure µ on C, let m µ denote the Cauchy-Stieltjes transform of µ, given by and defined wherever the integral exists. To denote the empirical measure constructed from the roots of p n , we use and similarly our notation for the empirical measure constructed from the critical points, w The following assumptions describe some regularity conditions that µ must satisfy in the hypothesis of Theorem 2.3.
Assumption 2.2 (Alternative to Assumption 2.1 for radially symmetric distributions). Suppose µ has two finite absolute moments and a continuous density, f , that is radially symmetric about z = z 0 and that satisfies f (z 0 ) > 0.
We can now state the main result of this subsection. Theorem 2.3. Let X 1 , . . . , X n be iid, complex random variables whose distribution, µ, has a bounded density and satisfies either Assumption 2.1 or Assumption 2.2. Then, there is a positive constant C, depending on µ, so that with probability 1 − o(1), where η n := max 1≤j≤n |X j |, and µ n , µ ′ n (defined in (4) and (5)) are the empirical measures constructed from the roots and critical points of In the case where µ has sub-exponential tails, one can show that with probability tending to 1, η n = O(ln n). Consequently, Theorem 2.3 immediately implies the following corollary.
Corollary 2.4. Let X 1 , . . . , X n be iid, complex random variables whose distribution, µ, satisfies Assumption 2.1 part (i) in addition to the following condition: Then, there is a positive constant C µ , depending only on µ, so that with probability where µ n , µ ′ n (defined in (4) and (5)) are the empirical measures constructed from the roots and critical points of Theorem 2.3 and Corollary 2.4 show that the roots and critical points can be paired in such a way that the typical spacing between a critical point and its paired root is O(n −1 ), up to logarithmic corrections. This precisely describes the phenomenon observed in Figures 1 and 2, and the authors believe that these bounds are optimal (up to logarithmic factors) based on the theorems of Section 2.3 below and the results in [18].
A few remarks concerning Theorem 2.3 and its corollary are in order. Due to the heuristic that motivates our proof of Theorem 2.3 (see Figure 5), the authors conjecture that Assumption 2.1 part (i) can be weakened to require that for some fixed δ > 0, P(|m µ (X 1 )| < ε) ≤ C 1 ε 1+δ . At present, we require δ = 1 to obtain some technical bounds in the proof. An examination of the proof reveals exactly where this condition is needed.
The second remark concerns the appearance of η n on the right-hand side of (6). The authors believe this term is at least partially necessary. Indeed, based on numerical experiments, the Wasserstein distance W 1 (µ n , µ ′ n ) appears larger for distributions µ with extremely heavy tails. In this way, η n can be viewed as quantifying how heavy-tailed the distribution µ is. The following lemma will be useful for verifying Assumptions 2.1 and 2.2 in a variety of situations.
We note that Lemma 2.5 also appears as Proposition 3.1 in [18]; the proof of Lemma 2.5 is located in Appendix A.
Example 2.6 (µ is uniform on a disk). If µ has a uniform distribution on the disk of radius R centered at z 0 , then, µ has density and Cauchy-Stieltjes transform (Indeed, use Lemma 2.5 in the case z = 0, R = 1, and apply a linear transformation.) It follows that if X ∼ µ, then for any ε < 1, so µ satisfies Assumption 2.1, and by Theorem 2.3, with probability 1 − o(1), W 1 (µ n , µ ′ n ) = O((ln n) 9 /n). (Note that almost surely, η n ≤ |z 0 | + R). Example 2.7 (µ is supported on all of C). Assumption 2.2 is easy to verify for a large class of measures that do not necessarily have compact support. For example, suppose µ has a standard complex normal distribution with density Clearly, µ is radially symmetric about the origin, and f (z) is continuous with f (0) = π −1 > 0. Furthermore, µ has sub-exponential tails, so by Corollary 2.4, with probability tending to 1, W 1 (µ n , µ ′ n ) ≤ O((ln n) 10 /n). Figure 2 illustrates this example.
Example 2.8 (µ is not radially symmetric). In this last example, we consider a situation where µ does not exhibit radial symmetry. Suppose µ is uniform on the two disks B(−2, 1) and B(2, 1) with density which is depicted in Figure 1. By separately considering the cases |z + 2| < 1, |z − 2| < 1, and |z ± 2| ≥ 1, we can use the calculations from Example 2.6 to obtain the Cauchy-Stieltjes transform: Since µ has compact support, Assumption 2.1 part (ii) holds trivially. In Appendix A, we establish part (i), so by Theorem 2.3, with probability at least 1 − o(1), W 1 (µ n , µ ′ n ) = O((ln n) 9 /n). 2.3. Microscopic behavior of the critical points. While Theorem 2.3 describes the typical distance between a root and its paired critical point, it does not allow one to study any particular root or critical point. Toward this end, we now fix several of the roots and treat them as deterministic: consider the polynomial where X 1 , . . . , X n+1−s are iid complex-valued random variables with distribution µ and ξ = (ξ 1 , . . . , ξ s ) is a deterministic vector in C s . Our goal is to simultaneously study the behavior of the critical points closest to ξ l , 1 ≤ l ≤ s.
Our first result, Theorem 2.9, covers the situation where ξ 1 , . . . , ξ s are inside the support of µ. In particular, for each 1 ≤ l ≤ s, equation (9) locates the critical point, w (n) l , that is near ξ l to within O(n −2 ) (up to logarithmic corrections). This bound indicates that each w (n) l is centered at rather than ξ l , and we use this to show that the vector (w s ) according to a law that converges in distribution to a multivariate normal distribution. See Figure 3.
Compare Theorem 2.9 to Theorem 2.2 of [18], which describes the same phenomenon when s = 1. Both theorems identify the same fluctuations of w (n) 1 about ξ 1 , however, the two results locate the critical point w (n) 1 on different scales. While Theorem 2.2 from [18] shows that w to within order O(n −2 ) up to logarithmic corrections. In fact, since 1 n n j=1 1 ξ1−Xj converges almost surely to m µ (ξ 1 ), the results of the two theorems can be combined to give a stronger picture of the local behavior of w (n) 1 . Note that in contrast to the method of proof used by Kabluchko and Seidel in [18], our argument is based on a deterministic result.
In order to state Theorem 2.9 we need the following definitions. Let denote the set of zeros of m µ . We say that a measure µ has a density in a neighborhood of z 0 if there exists a ρ > 0 so that the restriction of µ to the open ball B(z 0 , ρ) is absolutely continuous with respect to the Lebesgue measure on B(z 0 , ρ).  (see (8)).
Theorem 2.9 (Locations and fluctuations of critical points when p n has several deterministic roots). Let X 1 , X 2 , . . . be iid complex-valued random variables with distribution µ, fix s and the distinct, deterministic values ξ 1 , . . . , ξ s / ∈ M µ , and suppose that in a neighborhood of each ξ l , 1 ≤ l ≤ s, µ has a bounded density, f . Then, with probability 1 − o(1), the polynomial is the unique critical point of p n that is within a distance of 3 |mµ(ξ l )|n of ξ l , and In addition, if f is continuous at ξ 1 , . . . , ξ s , then we have in distribution as n → ∞, where (N 1 , . . . , N s ) is a vector of complex random variables whose real and imaginary components have a multivariate normal distribution with mean zero and covariance structure characterized by For values of ξ 1 , . . . , ξ s outside the support of µ, (10) and (11) demonstrate that the scaling factor n 3/2 / √ ln n is too small to achieve a meaningful result. (Indeed, f may be chosen to be identically zero outside supp(µ), so the random vector (N 1 , . . . N s ) is almost surely the zero vector.) The following result refines the analysis in this situation and is depicted in Figure 4. Theorem 2.10 (Locations and Fluctuations of critical points when p n has several roots outside supp(µ)). Let X 1 , X 2 , . . . be iid complex-valued random variables with common distribution µ, fix s ∈ N, and suppose ξ 1 , . . . , ξ s / ∈ supp(µ) ∪ M µ are distinct, fixed deterministic values. Then, there exist constants C, c µ, ξ , C µ, ξ > 0, so that with probability at least 1 − C exp(−c µ, ξ n), the polynomial is the unique critical point of p n that is within a distance of 3 |mµ(ξ l )|n of ξ l , and In addition, we have in distribution as n → ∞, where (N 1 , . . . , N s ) is a vector of complex random variables whose components have a multivariate normal distribution with mean zero and covariance structure Cov(Re(N j ), Im(N l )) = Cov Re 1 ξ j − X 1 , Im 1 ξ l − X 1 .  It turns out that in the case where µ has compact support, p n has no additional critical points outside an ε-neighborhood of supp(µ) ∪ M µ . In order to state this result, define N µ (ε) := {z ∈ C : dist(z, supp(µ) ∪ M µ ) < ε} to be the ε-neighborhood of supp(µ) ∪ M µ . (Here, ε is assumed to be positive, and dist(z, D) := inf w∈D |z − w| is the distance from z ∈ C to a set D ⊂ C.) The following corollary immediately follows from Theorem 2.9 of [23] and the Borel-Cantelli lemma applied to Theorem 2.10.
The proofs of Theorems 2.9 and 2.10 are based on the following technical, deterministic result.
Theorem 2.12. Suppose ξ is a complex number, X = (X 1 , X 2 , . . . , X n ) is a vector of complex numbers, and C 1 , C 2 , k Lip are positive values for which the following three conditions hold: Then, if C > 0 and n ∈ N satisfy the polynomial has exactly one critical point, w (n) ξ , that is within a distance of 3 2C1n of ξ, and We remark that criteria (i) and (ii) appear relevant in view of the equality which suggests that if 1 n n j=1 1 z−Xj is finite and bounded away from zero near ξ, then p ′ n (z) ≈ 0 for some z satisfying |z − ξ| = O(1/n). Assumption (iii) helps to guarantee that p n (z) has only one critical point that is within order O(1/n) of ξ, but with respect to establishing equation (16), (iii) is likely an artificial constraint related to the use of Rouché's theorem in the proof. Theorem 2.12 is a deterministic result, and as such applies to cases where X 1 , . . . , X n are random variables which are not independent or identically distributed. To illustrate this point, we conclude this subsection with a generalization of Theorem 2.10 and an application of Theorem 2.12 to the characteristic polynomials of random matrices.
Compare the following theorem to Theorem 2 in [14]. Both theorems discuss the pairing between s n roots and critical points of p n , where s n = o(n) is allowed to depend on n. Theorem 2.13 describes the locations of the critical points with higher precision than Theorem 2 of [14], however our theorem requires that the deterministic roots, ξ 1 , . . . , ξ sn be outside the support of µ, while Theorem 2 in [14] doesn't make this restriction. Theorem 2.13 (Locations of critical points when p n has many deterministic roots.). Suppose X 1 , X 2 , . . . are iid complex-valued random variables with distribution µ, let ξ 1 , ξ 2 , . . . be fixed deterministic values, let s n , l n , a n be positive integers less than n, and fix ε, L > 0, so that all of these together satisfy: (i) 1 ≤ s n ≤ l n = o(n), a n l n = o(n), a n = o( √ n); (ii) min {|m µ (ξ l )| : 1 ≤ l ≤ s n } ≥ ε and max {|m µ (ξ l )| : 1 ≤ l ≤ s n } ≤ L; (iii) min |ξ l − x| : 1 ≤ l ≤ s n , x ∈ supp(µ) ∪ {ξ j } ln j=1,j =l > 6 ε·an . Then, there exist constants C, c µ,ε,L , C µ,ε,L > 0 so that with probability at least 1 − C · s n exp(−c µ,ε,L · n/a 2 n ), the polynomial has s n critical points, w We include the next result to demonstrate that the local pairing phenomenon between individual roots and critical points of p n can occur for some models where the roots are dependent. One such case is when p n is the characteristic polynomial of a random matrix.
Theorem 2.14. Fix ε > 0 and λ ∈ C with |λ| ≥ 1 + 3ε. Let M be an n × n random matrix whose entries are iid copies of a random variable with mean zero, unit variance, and finite fourth moment. Let A be an n × n deterministic matrix with operator norm O(1), rank O(1), and whose only nonzero eigenvalue is λ. Then almost surely, for n sufficiently large, the characteristic polynomial 2 of 1 √ n M + A satisfies the following properties: (i) The roots X 1 , . . . , X n−1 lie inside the disk B(0, 1 + 2ε).
(iii) p n contains a unique critical point, w and hence w Remark 2.15. The conclusion in (19) can be deduced from properties (i) and (ii) and Walsh's two circle theorem (see, for example, [27, Theorem 4.1.1]). However, the conclusion in (18) cannot be deduced from Walsh's two circle theorem and instead follows from Theorem 2.12. We prove Theorem 2.14 in Appendix A.

2.4.
Mesoscopic behavior of the critical points. In this subsection, we consider the mesoscopic behavior of the critical points of We begin with the case where X 1 , . . . , X n are arbitrary random variables (not assumed to be independent nor identically distributed) and then specialize our main result to several applications and examples.
Theorem 2.16 (Local law). Fix C > 0, and let X 1 , . . . , X n be complex-valued random variables (not necessarily independent nor identically distributed) which satisfy the following axioms.
(i) (Upper bound) With overwhelming probability, where Z is uniformly distributed on B(0, n C ), independent of X 1 , . . . , X n . Let ϕ : C → R be a twice continuously differentiable function (possibly depending on n) which is supported on B(0, n C ) and which satisfies the pointwise bound for all z ∈ C. Then, for every fixed c > 0 and every α > 0, n−1 are the critical points of the polynomial and ∆ϕ 1 is the L 1 -norm of ∆ϕ. Here, the implicit constants in our asymptotic notation depend on C, c, and α.
Remark 2.17. Condition (ii) on the random variables X 1 , . . . , X n from Theorem 2.16 is implied by the following: (ii') for every a > 0, there exists b > 0 such that, for almost every z ∈ B(0, n C ), with probability 1 − O a (n −a ). Indeed, the implication follows by simply conditioning on the random variable Z (which avoids a set of Lebesgue measure zero with probability 1).
The assumptions of Theorem 2.16 are fairly technical, and we derive some simpler conditions that guarantee when the hypotheses of Theorem 2.16 are met in Section 2.5. We now specialize Theorem 2.16 to the case where X 1 , . . . , X n are independent random variables.
Theorem 2.18 (Local law for independent roots). Fix C > 0, and let X 1 , . . . , X n be independent complex-valued random variables which satisfy In addition, assume X 1 is absolutely continuous (with respect to Lebesgue measure on C) and has density bounded by n C . Let ϕ : C → R be a twice continuously differentiable function (possibly depending on n) which is supported on B(0, n C ) and which satisfies the pointwise bound given in (21) for all z ∈ C. Then, for every fixed c > 0 and every α > 0, n−1 are the critical points of the polynomial and ∆ϕ 1 is the L 1 -norm of ∆ϕ. Here, the implicit constants in our asymptotic notation depend on C, c, and α.
Theorem 2.18 can be viewed as a local version of Theorem 1.1 and (2). Indeed, since the functions in the theorem above can depend on n, one can approximate an indicator function of Borel sets which change with n. Of particular importance is the case where ϕ is supported on disks which shrink with n, and in this way, Theorem 2.18 can describe the mesoscopic behavior of the critical points. In addition, the error bound in Theorem 2.18 is significantly better then the error term from (2).
Interestingly, Theorem 2.18 only requires a single root (X 1 ) to actually be random; the rest may be deterministic. In particular, since the density of X 1 is bounded by n C , X 1 can itself be quite close to deterministic. Obviously, though, the result fails for deterministic polynomials. For example, consider q n (z) := z n − 1. The conclusion of Theorem 2.18 fails for this polynomial since all of the critical points are located at the origin while the roots are the n-th roots of unity, located on the unit circle. However, Theorem 2.18 does apply to p n (z) := q n (z)(z − X), where X is uniformly distributed on B(z 0 , n −C/2 ) for any fixed z 0 ∈ C. Theorem 2.18 strengthens Theorem 1.6 of [3] for the empirical distribution associated with the zeros of p ′ n by providing a rate of convergence. As a consequence of Theorem 2.18, we have the following central limit theorem (CLT).
Theorem 2.19 (Central limit theorem for linear statistics). Let X 1 , X 2 , . . . be iid random variables which are absolutely continuous (with respect to Lebesgue measure on C) and have a bounded density. In addition, assume E|X 1 | < ∞. Let ϕ : C → R be a twice continuously differentiable function with compact support which does not depend on n. Then, n−1 are the critical points of the polynomial and v 2 is the variance of ϕ(X 1 ).
We now state a version of Theorem 2.18 that applies when the function ϕ is analytic. The advantage of this result is that it does not contain the extra factor of log n present in the error term from Theorem 2.18. The trade-off is that the function ϕ is now required to be analytic, a much more rigid assumption.
Theorem 2.20 (Local law for analytic test functions). Fix C, c, ε > 0. Let µ be a probability measure on C supported on B(0, C), and assume for all z ∈ Γ, where Γ is the boundary of B(0, C + ε). Then for any function ϕ (possibly depending on n), analytic in a neighborhood containing the closure of n−1 are the critical points of the polynomial and X 1 , . . . , X n are iid random variables with distribution µ. Here, the implicit constants in our asymptotic notation depend on C, c, and ε.

2.5.
Guaranteeing the assumptions in the local law. In this section, we provide some criteria for assuring the assumptions in Theorem 2.16 are met.
Lemma 2.21 (Simple criterion for an upper bound). Fix C, ε > 0, and suppose X 1 , . . . , X n are complex-valued random variables (not necessarily independent nor identically distributed). If max with overwhelming probability.
Proof. As the claim follows from a simple application of Markov's inequality.

Lemma 2.22 (Criterion for anti-concentration).
Fix C > 0, and let X 1 , . . . , X n be complex-valued random variables such that X 1 is independent of X 2 , . . . , X n . In addition, assume X 1 is absolutely continuous (with respect to Lebesgue measure on C) with density bounded by n C , and suppose that E|X 1 | ≤ n C . Then for every where Z is uniformly distributed on B(0, n C ) and independent of X 1 , . . . , X n .
We prove Lemma 2.22 in Appendix B.
2.6. Overview and outline. The remainder of the paper is devoted to proving our main results. In Section 3, we establish the microscopic results of Subsection 2.3 starting with Theorem 2.12 which we use to prove Theorems 2.9, 2.10, and 2.13. Section 4 contains the proofs of the local laws from Subsection 2.4 including those for Theorems 2.16, 2.18, 2.19, and 2.20. We conclude the paper with a proof of Theorem 2.3 in Section 5.
There are two appendices that contain minor lemmata and supporting calculations. In Appendix A, we justify Lemma 2.5, Theorem 2.14, and the Lemma 2.22. We also include calculations related to Example 2.8. Appendix B contains some classical arguments that establish a Lindeberg CLT that we use to prove part of Theorem 2.9.

Proof of results in Section 2.3
We begin this section with the proof of Theorem 2.12 from which several of the main results follow.
3.1. Proof of Theorem 2.12. Our strategy is to compare p n (z) to the simpler polynomial of p n and p n , respectively, are close to each other. In particular, we will use Rouché's theorem to show that L n and L n both have exactly one zero in each of the nested open balls 1 ξ−Xj can be easily verified to be a root of L n . By "clearing the denominators" we will conclude that p n has exactly one critical point in each of the two balls. The lemma below establishes a few key facts that we frequently reference throughout the proof.
. By the triangle inequality, we have and by the hypothesis that n > 4C 2 C(k Lip + 1), it follows that On the other hand, we have and the assumption n > 4C 2 C(k Lip + 1) guarantees that . This establishes the first inequality. The second follows from nearly identical reasoning; we omit the details. To achieve the inequalities 1 C1n < |z − X j |, we use |z − ξ| < 5 4C1n , which we just proved, and the assumption that min 1≤j≤n |ξ − X j | > 3 C1n . Indeed, for 1 ≤ j ≤ n, the triangle inequality yields This completes the proof of part (i). Part (ii) follows from nearly identical reasoning. Note that the assumption n > 4C 2 /C 1 is useful for achieving the lower bound on |z − Y n |. We omit the remaining details.
The lower bounds in Lemma 3.1 imply that under the assumptions of Theorem 2.12, L n (z) and L n (z) are holomorphic on the domain D sm n and that (z − ξ)L n (z) and (z − ξ) L n (z) are holomorphic on the domain D lg n . We will show that under the same assumptions, L n (z) − L n (z) < L n (z) for z in the boundaries of D sm n and D lg n in order to justify Rouché's theorem. To that end, assume the hypotheses of Theorem 2.12 and let z ∈ ∂D sm n ∪ ∂D lg n . Then, the triangle inequality implies where we have used hypothesis (ii) of Theorem 2.12 to bound the first term on the left. By factoring n j=1 1 ξ−Xj from both terms in the right summand, we obtain and then, combining the fractions, factoring out another n j=1 1 ξ−Xj , and applying hypothesis (i) of Theorem 2.12 twice yields Finally, we can use the reverse triangle inequality and hypothesis (i) of Theorem 2.12 to show At this point, we split the argument into two cases: |z − c n | = C(kLip+1) n 2 and |z − ξ| = 3 2nC1 . In the first case, Lemma 3.1 guarantees that |z − ξ| < 2 nC1 , and the hypotheses of Theorem 2.12 require that 1 2 > 2C2 nC1 , so we obtain On the other hand, where the last inequality follows from Lemma 3.1. One of the assumptions in Theorem 2.12 is that C > Combining (24) and (25) yields L n (z) − L n (z) < L n (z) for z in the boundary of D sm n . In addition, recall (Lemma 3.1 part (ii)) that L n (z) and L(z) are holomorphic on the domain D sm n , so Rouché's theorem guarantees that L n (z) and L n (z) have the same number of zeros inside D sm n . Since c n is the unique zero of L n (z) in D sm n , we conclude that L n (z) has exactly one zero, w (which is analytic for z ∈ D sm n by (i) of Lemma 3.1), so the zeros of L n (z) in D sm n are the same as the critical points of p n (z) in D sm n . We conclude that p n (z) has exactly one critical point in D sm n . Lemma 3.1 shows that D sm n ⊂ D lg n , so it remains to establish that p n (z) also has exactly one critical point in D lg n , for then, the critical point in both domains must be the same one. Continuing from (23), in the case where |z − ξ| = 3 2C1n , we obtain where we have once again used the assumption that 1 2 ≥ 2C2 nC1 . Similarly to above, we also have where the inequality follows from Lemma 3.1, (ii). From the assumptions on n and C in Theorem 2.12, it follows that Combining (26) and (27) yields L n (z) − L n (z) < L n (z) for z in the boundary of has exactly one zero in D lg n , too. (Note: by Lemma 3.1, (i), D sm n ⊂ D lg n .) Hence, p ′ n (z) has exactly one root in D lg n , and as we showed above, this root lies in D sm n . The proof of Theorem 2.12 is complete.
In the remainder of this section, we use Theorem 2.12 to prove Theorems 2.9 and 2.10. We also include a subsection where we give sketch of how the arguments could be modified to prove Theorem 2.13. When ξ ∈ supp(µ), it is difficult to control 1 n n j=1 1 ξ−Xj , so we start with the proof of Theorem 2.10, which is more straightforward than the justification of Theorem 2.9.
3.2. Proof of Theorem 2.10. We begin by establishing equation (12) via Theorem 2.12. To that end, we consider {ξ l } s l=1 , one at a time, letting each in turn play the role of ξ in the statement of Theorem 2.12. Fix ξ l , 1 ≤ l ≤ s. We will show that for large n, on the complement of the "bad" event the hypotheses of Theorem 2.12 are satisfied with ξ = ξ l , X = (ξ 1 , . . . , ξ l−1 , ξ l+1 , . . . , ξ s , X 1 , . . . , X n+1−s ), and the positive constants (Here, dist(z, D) := inf w∈D |z − w| is the distance from z ∈ C to a set D ⊂ C.) For large n, on the complement of E l n , (The last inequality holds for large n.) Similarly, for large n, on the event E l n c , and condition (i) of Theorem 2.12 follows from equations (29) and (30). If n is chosen large enough that then condition (iii) of Theorem 2.12 holds, and for |z − ξ l | ≤ 2 In particular, this shows that for positive integers n > 3(C 1 ε l ) −1 and complex numbers z, w ∈ z : |z − ξ| ≤ 2 C 1,l n , which implies condition (ii) of Theorem 2.12.

LOCAL BEHAVIOR OF CRITICAL POINTS AND ROOTS OF RANDOM POLYNOMIALS 23
Now, fix any C > max 1≤l≤s . If n is a natural number large enough to guarantee inequalities (29) and (30) for 1 ≤ l ≤ s and that satisfies Theorem 2.12 guarantees that on the complement of ∪ s l=1 E l n , the polynomial p n has s critical points, w is the unique critical point of p n that is within a distance of 3 |mµ(ξ l )|n of ξ l , and (Note that for large n, w We complete our justification of (12) from Theorem 2.10 by choosing C µ, ξ larger than max l C(k Lip,l + 1) and applying Hoeffding's inequality to the bounded random variables (ξ l − X j ) −1 to achieve the desired control over P(∪ l E l n ). More specifically, since ξ l / ∈ supp(µ) for 1 ≤ l ≤ s, the random variables Y l j := (ξ l − X j ) −1 are almost surely uniformly bounded by K l := dist(ξ l , supp(µ)) −1 , and the following version of Hoeffding's inequality applies with t l := |mµ(ξ l )| 2.1 . Lemma 3.2 (Hoeffding's inequality for complex-valued random variable; Lemma 3.1 from [23])). Let Y 1 , . . . , Y n be iid complex-valued random variables which satisfy |Y j | ≤ K almost surely for some K > 0. Then there exist absolute constants C, c > 0 such that for every t > 0.
By Lemma 3.2, we can find C, c µ, ξ > 0 such that ∪ l E l n occurs with probability at least 1 − C exp(−c µ, ξ n) as is desired.
We have established with overwhelming probability the existence of the critical points w (n) 1 , . . . , w (n) s characterized by (12). It remains to show that they satisfy the convergence in (13). To that end, apply the Borel-Cantelli Lemma to the events ∪ l E l n to see that almost surely, for large enough n, w (n) l satisfies (12) for 1 ≤ l ≤ n.
It follows that with probability 1, for sufficiently large n and any l, 1 ≤ l ≤ s, Now, we will use the Cramér-Wold device (see e.g. Theorem 29.4 in [2]) to show the convergence (13). To start, let t 1 , . . . , t s , r 1 , . . . , r s be arbitrary real numbers and define the random variables for 1 ≤ l ≤ s. By (33), we have, with probability tending to 1, where all of the implied constants depend on ξ 1 , . . . , ξ s and µ, and we have made ample use of Slutsky's theorem (see e.g. Theorem 11.4 from [11]). To obtain the last line, we also used the classical CLT (see e.g. Theorem 29.5 from [2]) in conjunction with Slutsky's theorem. If we take linear combinations of the real and imaginary parts of Y n,l , we obtain that with probability at least 1 − o(1), which converges by the classical CLT (and Slutsky's theorem) in distribution to a normally distributed random variable with mean 0 and variance This limiting distribution is also the distribution of the random variable with covariance structure given by (14), so by the Cramér-Wold strategy, the proof of Theorem 2.10 is complete. The next subsection illustrates how to modify the argument above to prove Theorem 2.13.
3.3. Justification of Theorem 2.13. Theorem 2.13 follows from an argument quite similar to the one provided in the previous subsection. We outline the main differences in the following proof sketch.
Argue as in Subsection 3.2 for each l, 1 ≤ l ≤ s n , separately but in place of the definitions in equation (28) choose Also, modify the events E l n into the events Notice that condition (i) from Theorem 2.12 now holds for n sufficiently large (depending on the rate of convergence of a n l n /n → 0) on the complement of E l n because 1 n ln k=1,k =l and this limit is uniform with respect to 1 ≤ l ≤ s n . The requirements (31) on n now become which hold uniformly for 1 ≤ l ≤ s n by assumption (i) in the statement of Theorem 2.13. By Hoeffding's inequality (Lemma 3.2), with Y l j := 1 ξ l −Xj , K l := εan 6 , and there are constants C, c µ,ε > 0, independent of l, ξ l , and s n , so that for large n Taking a union over l, 1 ≤ l ≤ s n establishes the desired result.
3.4. Proof of Theorem 2.9. We now proceed to prove Theorem 2.9. In order to control the behavior of 1 n n j=1 1 ξ−Xj , we will rely on the Law of Large Numbers. Lemma 3.3 below justifies this approach by establishing some regularity properties for E(ξ − X 1 ) −1 = m µ (ξ) that we will continue to use throughout the remainder of the paper. We note that Lemma 3.3 is similar to Lemma 5.7 in [18]. . Suppose that on B(ξ, ρ) ⊂ C, µ has a density with respect to the Lebesgue measure that is bounded by C µ,ξ,ρ . Then, (i) for any z ∈ B(ξ, ρ/2), (ii) if ρ = ∞ so that µ has a density bounded by C µ on all of C, then, there exist constants κ µ , ε µ > 0, depending on µ, so that the following holds. If x, y ∈ C with |x − y| < ε µ , then Proof. To prove the first inequality, observe that for any z ∈ B(ξ, ρ/2), where we have used polar coordinates in the integral.
To prove (ii), let Z ∼ µ and fix x, y ∈ C with |x − y| ≤ 1. We will compute the difference whose union has probability 1. By the triangle inequality, we have We will bound each term separately as follows. Via Cauchy-Schwarz, we have Next, observe that For similar reasons, and we can combine the last few inequalities to obtain The proof of Lemma 3.3 is complete.
We proceed to prove Theorem 2.9, starting with a justification of (9) in the case s = 1 and ξ 1 = ξ. Choose ρ ξ > 0 so that in the disk B(ξ, 3ρ ξ ), µ has a density f that is bounded by C f . Our plan of attack will be to show that the hypotheses of Theorem 2.12 are satisfied on the complement of a "bad" event whose probability tends to 0 as n grows. To optimize our control over this event, we allow it to depend on the parameter ε n = o(1) that we will choose appropriately to achieve the asymptotic bound in (9).
To that end, suppose ε n ∈ (0, 1), let d n := ⌈ln( √ n)⌉, and for each n ≥ 1 define the annuli and the binomial random variables Consider the "bad" events We will demonstrate that if for ε n := (ln n) −2/3 and C µ,ξ defined in Lemma 3.4 below, then the conditions in Theorem 2.12 hold on the complement of E n ∪ G n ∪ k F k n for large enough n. Furthermore, we will show that the union of these events occurs with probability tending to 0. Notice that events E n , F k n , and G n are related to conditions (i), (ii), and (iii) of Theorem 2.12, respectively.
It is clear that condition (i) holds on the complement of E n because m µ (ξ) = 0. For n > 9 C 2 1 εn , (iii) is true, on the complement of G n , because in this case, εn n > 3 C1n . The following lemma establishes condition (ii).
Lemma 3.4. There exists a constant C µ,ξ > 0, depending only on µ and ξ, so that if ε n ∈ (0, 1), and then, on the complement of dn k=0 F k n ∪ G n , any complex numbers Proof. Fix z, w ∈ B ξ, 2 C1n and 1 ≤ j ≤ n. By applying the triangle inequality several times, we obtain Consequently, on the complement of
We have split the sum over 1 ≤ j ≤ n into d n + 2 pieces. Notice that for n > 8ρ ξ C1εn Additionally, if n > 8 C1ρ ξ and |X j − ξ| ≥ ρ ξ , then, It follows that if which completes the proof.
It remains to find an upper bound on the probability of E n ∪ d k=1 F k n ∪ G n , which we accomplish in the next lemma.
Proof. To control P(E n ), apply the Weak Law of Large Numbers to the random variables 1 ξ−Xj , which have finite expectation by Lemma 3.3. Next, consider that for large n, which establishes P(G n ) = O µ,ξ (ε n ). We now turn our attention to the events F k n . For 0 ≤ k ≤ d n and 1 ≤ j ≤ n, define the random variables χ j,k := ½ {Xj ∈A k n } , which, for a fixed k, are independent and identically distributed according to a Bernoulli distribution with parameter p k ≤ πC f ρ 2 ξ e 2k /n. Since N k n = n j=1 χ j,k has expectation at most πC f ρ 2 ξ e 2k , Markov's inequality yields In order to control the fourth central moment of N k n , recall that for two independent, real-valued random variables X and Y , Since χ j,k are iid, it follows by inductively applying the previous identity that Consequently, (37) becomes and by the union bound The proof of Lemma 3.5 is complete.
We have established that C 1 , C 2 , and k Lip defined in (36) satisfy conditions (i), (ii), and (iii) of Theorem 2.12 for large n, on the complement of E n ∪ dn k=0 F k n ∪ G n , a "bad" event whose probability tends to zero. Consequently, the conclusion of Theorem 2.12 guarantees that with probability at least 1 − o µ,ξ (1), the polynomial p n has a unique critical point w (n) ξ that fulfills (9). We now consider the case s > 1. The argument in this more general situation is much the same as the one just presented for s = 1, so we sketch the proof and point out the major differences. Consider each of the roots ξ l , 1 ≤ l ≤ s separately and modify the argument above in the obvious ways. In particular, we replace the annuli A k n with A 0 l,n := z ∈ C : |z − ξ l | < δ √ n , 1 ≤ l ≤ s; A k l,n := z ∈ C : where δ > 0 is any real number such that f is a density for µ in the balls B(ξ l , δ) and so that 2δ < min 1≤j<l≤s |ξ j − ξ l |. Define the random variables N k l,n accordingly, in addition to the modified "bad" events and the modified constants (Note that C l µ, ξ , 1 ≤ l ≤ s will be defined via Lemmata similar to Lemma 3.4.) On the complement of the union of the modified "bad" events, for each l, 1 ≤ l ≤ s, conditions (i), (ii), and (iii) of Theorem 2.12 hold for reasons similar to those given in the argument for s = 1 above. (Notice that for 1 ≤ l ≤ s, so computations similar to (29) and (30)  l , 1 ≤ l ≤ s. This is considerably more difficult than our consideration of (13) because in the current situation, (ξ l − X j ) −1 , are heavy-tailed random variables. In Appendix B, we appeal to the Lindeberg exchange method with an appropriate truncation to establish Theorem B.1, a CLT that we use to prove (10) in a similar manner to our justification of (13).
To start, consider that with probability 1 − o(1), w (n) l , 1 ≤ l ≤ s satisfy (9), so with inspiration from (33) and (34), we obtain with probability at least 1 − o(1) that for 1 ≤ l ≤ s, where all of the implied constants depend on ξ 1 , . . . , ξ s and µ, and we have used Slutsky's theorem several times. (We also used the heavy-tailed CLT, Theorem B.1 once.) For the arbitrary constants t 1 , . . . , t s ∈ C, we have with probability at least which converges in distribution by Slutsky's theorem and Theorem B.1 to a normal distribution with with mean zero and variance . This is exactly the same distribution as the sum Re ( s l=1 t l N l ), where N l are defined as in (10) with covariance structure (11). Recall that studying the real parts of the linear combinations over C of a s-complex dimensional random vector is the same as analyzing the linear combinations over R of a 2s-real dimensional random vector.  [34]). Let (X, µ) be a probability space, and let F : X → C be a square-integrable function. Let m ≥ 1, let x 1 , . . . , x m be drawn independently at random from X with distribution µ, and let S be the empirical average Then S has mean X F dµ and variance 1 m X |F − X F dµ| 2 dµ. In particular, by Chebyshev's inequality, one has for any t > 0, or equivalently, for any δ > 0 one has with probability at least 1 − δ that Fix C > 0, and let X 1 , . . . , X n be complex-valued random variables (not necessarily independent nor identically distributed) such that, with overwhelming probability, max Let ϕ : C → R be a twice continuously differentiable function (possibly depending on n) which satisfies the pointwise bound in (21) for all z ∈ C. Then, with overwhelming probability, and Proof. The bound in (41) follows immediately from the pointwise bound in (21). In order to prove (39) it suffices, by the pointwise bound in (21), to prove that with overwhelming probability By supposition, we now work on the event where X 1 , . . . , X n ∈ B(0, e n C ). As where B := B(0, n C ). Since X 1 , . . . , X n ∈ B(0, e n C ), it follows that max 1≤i≤n B\B(Xi,1) where |B| is the Lebesgue measure of B, and |B| = O(n 2C ). Near each root, we have max 1≤i≤n B∩B(Xi,1) since log | · | is locally square-integrable. This completes the proof of (39). For (40), we observe that on the event where (38) holds, the Gauss-Lucas theorem implies that n−1 are the critical points of p n . Working on this event, the proof follows from the same procedure as we used to prove (39); we omit the details.

Lemma 4.3 (Crude upper bound).
Fix C > 0, and let X 1 , . . . , X n be complexvalued random variables (not necessarily independent nor identically distributed). Assume Z is uniformly distributed on B(0, n C ), independent of X 1 , . . . , X n . Then for every a > 0, there exits b > 0 such that Proof. Conditioning on X 1 , . . . , X n , we find that for all ε > 0. In addition, on the event where min 1≤i≤n |Z − X i | > ε, we have In order to prove the claim, it suffices to assume a > 2C. In this case, by taking ε := n 2C n a+1 , the result follows from the estimates above.
We now prove Theorem 2.16.
Proof of Theorem 2.16. Let B := B(0, n C ), and let |B| denote its Lebesgue measure. Fix α > 0, and let β ∈ N be a large constant (depending on C, c, α) to be chosen later.
Using the log-transform of the empirical measures constructed from the roots and critical points of p, we obtain (These identities can also be found in a more general form in [16, Section 2.4.1].) Instead of working with the integrals on the right-hand sides, we will work with large empirical averages by applying Lemma 4.1. Indeed, let m := n β , and let Z 1 , . . . , Z m be iid random variables uniformly distributed on B, independent of X 1 , . . . , X n . Taking β sufficiently large and applying Lemmas 4.1 and 4.2, we conclude that with probability 1−O(n −α ). In addition, by (20), Lemma 4.3, and the union bound it follows that there exists b > 0 such that with probability 1 − O(n −α ). The proof of the theorem is complete.
Therefore, we conclude that with probability 1 − O(n −100 ). By the classical CLT, in distribution as n → ∞, where v 2 is the variance of ϕ(X 1 ), and the claim follows.

4.3.
Proof of Theorem 2.20. We will need the following companion matrix result, which describes a matrix whose eigenvalues are the critical points of a given polynomial. This result appears to have originally been developed in [19] (see [19,Lemma 5.7]). However, the same result was later rediscovered and significantly generalized by Cheung and Ng [5,6].  where I is the n × n identity matrix and J is the n × n all-one matrix.
We will also need the Sherman-Morrison formula for computing the inverse of a rank one update to a matrix.  Proof. Clearly |z| = C + ε for all z ∈ Γ. Thus, it suffices to prove that with overwhelming probability. The claim now follows from the uniform bound in [23, Lemma 4.1] and the assumption on m µ given in (22).
With Lemma 4.6 in hand, we are now prepared to present the proof of Theorem 2.20.
Proof of Theorem 2.20. Let D be the diagonal matrix D := diag(X 1 , . . . , X n ). Using the notation from Theorem 4.4, we observe that zI − D is invertible for all z ∈ Γ since X 1 , . . . , X n ∈ B(0, C) by supposition. In addition, by the Gauss-Lucas theorem and Theorem 4.4, it must be the case that the eigenvalues of D I − 1 n J are also contained in B(0, C). This implies that zI − D I − 1 n J is also invertible for every z ∈ Γ. In view of these observations, we define the resolvents Thus, by Cauchy's integral formula We now take the difference of these two equalities. Since |ϕ(0)| ≪ Γ |ϕ(z)||dz|, it suffices by the triangle inequality to show with overwhelming probability. Since J = 11 T , where 1 is the all-ones vector, the Sherman-Morrison formula (Lemma 4.5) implies that provided 1 + 1 n 1 T G(z)D1 = 0. In view of Lemma 4.6, there exists a constant c ′ > 0 (depending only on C, c, and ε) such that with overwhelming probability. Here, we have exploited the fact that D and G(z) are diagonal matrices, which implies that Using (48) and (49), we conclude that with overwhelming probability To bound this last remaining term, we again exploit the fact that J = 11 T . Indeed, from the cyclic property of the trace, we have the deterministic bound for all z ∈ Γ. Combining the bounds above, we obtain (47), and the proof is complete.

Proof of Theorem 2.3
This section is devoted to proving Theorem 2.3. Our first lemma shows that Assumption 2.2 implies Assumption 2.1. Proof. Without loss of generality, suppose µ is radially symmetric about z = 0, and let X ∼ µ. By Lemma 2.5, we can write |m µ (z)| = P(|X| < |z|) |z| , so the hypotheses guarantee that |m µ (z)| is continuous on C\{0}. (Indeed, P(|X| < r) is the cumulative distribution function associated to the radial part of µ, which has a continuous density.) Since f (0) > 0, there are δ, c > 0 so that |z| ≤ δ implies |f (z)| ≥ c > 0. In particular, for |z| ≤ δ, Let r 1/2 be any value for which P(|X| < r 1/2 ) = 1/2. By the extreme value theorem, |m µ (z)| achieves its minimum, m min , on the closed, bounded annulus We know that m min is non-zero by (50) and the fact that P(|X| < r) is nondecreasing in r. This second fact additionally implies that for |z| ≥ r 1/2 , We conclude that for any ε ∈ (0, m min ), for some C > 0. (We have used the fact that µ has two finite absolute moments to bound the last probability.) It follows that µ satisfies Assumption 2.1 part (i). To see that µ satisfies Assumption 2.1 part (ii), let X 1 , . . . , X n be iid complexvalued random variables with distribution µ, and observe that By Markov's inequality, which completes the argument. Figure 5 that depicts the roots (red dots) and critical points (blue crosses) of p n (z) when the roots, X 1 , . . . , X 150 are chosen independently and uniformly in the unit disk centered at the origin. The observer will notice two things: 1) since the X j are chosen uniformly at random, they tend to "clump together," and 2) the roots further from the origin tend to "pair" more closely with nearby critical points than the roots near the origin. The first of these makes it difficult to use our strategy from Theorems 2.9, 2.10 and 2.12, where it was a simple matter to "zoom in" on a fixed root and ensure that no other roots were nearby. We address this concern by grouping the critical points that lie near each "clump" of roots and simultaneously considering all of the critical points that lie in the same group. We will show that each "clump" of roots (and its corresponding group of critical points) is far away from other "clumps," for large n.

Introduction to and motivation for the proof of Theorem 2.3. The following proof of Theorem 2.3 is motivated by the illustration in
The second observation can be explained by Theorem 2.9, which suggests that the closest critical point, w (n) j , to a given root X j is at a distance 1 n|mµ(Xj )| from X j . For example, in the case where µ is uniform on the unit disk, |m µ (z)| = |z| for |z| ≤ 1, so near the origin, it makes sense that the "pairing" phenomenon gets worse. We tackle this problem by counting the "clumps" of roots and critical points in exponentially widening, nested regions that avoid the zeros of m µ . (In Figure  5, these are the annuli delimited by concentric dashed circles.) Using this method, we can take advantage of the fact that the number of "clumps" that are a given distance from the zero set of m µ is roughly proportional to the strength of the "pairing" within those "clumps." The "pairing" phenomenon is quite unreliable near the zeros of m µ , so for any "clumps" that are sufficiently close to the zeros of m µ , we bound the distances between the roots and critical points using the Gauss-Lucas theorem. (In fact, this is where we expect to find the "extra," un-paired root that results because p n has a higher degree than p ′ n ). In order to synthesize these two ideas, we will form random, disjoint, simple closed curves to encircle each "clump" of roots and critical points. We will build the curves from the arcs of circles centered at the roots of p n and will use smaller circles for roots that are farther away from the zeros of m µ . See, for example, the boundaries of the gray domains depicted in Figure 5. We will conclude with an argument involving Rouché's theorem to count the number of critical points interior to each curve by comparing p ′ n to a simpler polynomial whose critical points can be located with Walsh's two circle theorem. Near the zeros of m µ , our method breaks down, and we use the Gauss-Lucas theorem for a bound on the distances between the critical points and roots of p n . Luckily, there are few critical points near the zeros of m µ , a fact which follows in part from Assumptions 2.1 and 2.2.

5.2.
Definitions. In view of Lemma 5.1, we prove Theorem 2.3 under Assumption 2.1. Let C µ > 0 be larger than each of the constants in Assumption 2.1 and larger than the constant bounding the density associated to µ. For each n ∈ N, define the following sets which partition C into regions based on the size of |m µ (z)|: Additionally, define the random variables and let N n be a n −1/2 -net of the closed disk B(0, n Cµ ) that satisfies: (i) B(0, n Cµ ) ⊆ x∈Nn B(x, n −1/2 ), (ii) if x, y ∈ N n , and x = y, then |x − y| ≥ 1 2 √ n , (iii) #N n = O µ (n 1+2Cµ ). Such a collection of points exists by e.g. Lemma 3.3 in [23]. Let δ > 0 be a fixed real parameter to be chosen later. We will show that the conclusion of Theorem 2.3 holds on the complement of the union of the following "bad" events: E k n := N k n ≥ 2C µ e 2k ln(ln n) , ⌊4 ln(ln n)⌋ ≤ k ≤ ln √ n ; For convenience, we use E bad n to denote the union of all of the "bad" events:

5.3.
The "bad" events are unlikely. In this subsection, we establish that By assumption, P(H n ) = o(1), so it remains to bound the probabilities of the remaining events.
Proof. Observe that for a fixed n and k, ⌊4 ln(ln n)⌋ ≤ k ≤ ⌊ln ( √ n)⌋, N k n is a binomial random variable with parameters n and p k ≤ C µ e 2k /n. By Markov's inequality, we have, If we take the union over k, we obtain which implies the desired result.
Proof. We will use the method of moments to control the probability of each F i n , 1 ≤ i ≤ n. Since F i n ⊂ |m µ (X i )| ≥ n −1/2 , we will often assume that |m µ (X i )| ≥ n −1/2 in our calculations. Recall from Lemma 3.3, part (i) that |m µ (X i )| is almost surely bounded above by an absolute constant (that depends only on µ).
First, we argue that for complex-valued random variables X, Y , where Y has a finite fourth absolute moment, Indeed, we have from which the desired result follows. Now, for X = X i and Y = ζ and similarly, Consequently, via (53), there are positive constants C ′ µ , K µ that depend only on µ so that if n ≥ K µ , on the event |m µ (X i ) Next, we show that there are constants C ′′ µ , K ′ µ > 0 that depend only on µ, so that for n ≥ K ′ µ and any fixed i, 1 ≤ i ≤ n, and observe that if we distribute the factors inside the expectation, the independence of {X j } n j=1 implies that the only terms which contribute to a nonzero expectation are bounded by expectations of the form where 1 ≤ j, k ≤ n and j, k = i. By a routine counting argument and the fact that ζ where l = i is any fixed index. From (54) and the bounds on E[ Y 2 | X i ] and E [|Y | | X i ] above, we can find C ′′ µ , K ′ µ > 0 large enough so that n ≥ K ′ µ implies (55). (For the asymptotics, we are using that n −1/2 ≤ |m µ (X i )| = O µ (1), where the implied constant depends only on µ.) Via Markov's inequality, it follows that for n ≥ K ′ µ and a fixed i, 1 ≤ i ≤ n, on the event |m µ (X i )| ≥ n −1/2 ,

LOCAL BEHAVIOR OF CRITICAL POINTS AND ROOTS OF RANDOM POLYNOMIALS 45
We conclude the proof by demonstrating that where we used (56) to bound P(F 1 n | X 1 ). Assumption 2.1 guarantees that We also have e 2⌊ln( √ n)⌋ ≥ e 2 ln( √ n)−2 = ne −2 .
Hence, for large n, our calculation from above yields Lemma 5.4. For a fixed δ ∈ 0, 1 2πCµ , Proof. This is a straight-forward application of the Chernoff bound for binomial random variables. In particular, for each x ∈ N n , define the random variable which has a binomial distribution with parameters n and p ≤ πC µ /n. The moment generating function for N x is E[e tNx ] = (1 + p(e t − 1)) n ≤ e np(e t −1) ≤ e πCµ(e t −1) .
Choosing t = ln(1 + 1/(πC µ ) ln n) establishes and by Markov's inequality, we obtain Note that the bound is independent of x, and that the argument can be easily modified (by conditioning on X i ) to show that for a fixed 1 ≤ i ≤ n, Hence, we can apply the the union bound over all x ∈ N n and X 1 , . . . , X n to obtain the desired result.
Combining Lemmas 5.2, 5.3, and 5.4 from this subsection establishes (52), so for the remainder of the proof, we work on the complements of the "bad" events.

5.4.
Constructing disjoint domains that partition the roots. We will create disjoint domains which contain clusters of roots of p n (z) that are close to one another and show that inside each domain, the numbers of roots and critical points of p n (z) are the same. The domains will be disjoint to ensure that no roots or critical points are counted more than once (see Figure 5 for reference). For technical reasons involving Rouché's theorem, we will require that the boundaries of the regions be simple, closed curves.
Our strategy will be to make an open ball around each X i , 1 ≤ i ≤ n and to consider the path-connected components of the union of these balls. Some of the resulting regions may not be simply connected, so we need to "fill in the holes." To start, define the random collection of open balls and define on {1, 2, . . . , n} the equivalence relation given by the following rule: i ∼ j if and only if there is a collection such that B k ∩ B k+1 = ∅ for 0 ≤ k ≤ l − 1. Let P n be the set of equivalence classes induced by ∼. The idea is that for a fixed P ∈ P n , forms a connected component of ∪ B∈Cn B. Each light gray region in Figure 5 is one connected component, U n,P for some P ∈ P n ; a "zoomed-in" version is presented in Figure 7. Notice that some of the U n,P , P ∈ P n may not have simple, closed boundaries, and some could be "nested" inside "holes" formed by others. We address these concerns in the following discussion, where we demonstrate how to select a simple, closed component of the boundary of each U n,P , P ∈ P n , whose interior contains U n,P . More specifically, for each equivalence class P ∈ P n , we will create a simple closed curve, γ n,P ⊂ ∂U n,P , such that each X j , j ∈ P is contained interior to the bounded component of C \ γ n,P . Furthermore, we will show that the interiors of the bounded regions defined by the curves {γ n,P } P ∈Pn are partially ordered with respect to set inclusion. This will allow us to combine "nested" regions.
To that end, fix an equivalence class P ∈ P n , and recall the definition of the open set U n,P from above. For simplicity, write where B 1 , . . . , B l are distinct open balls (in the definition of U n,P , some of the open balls could coincide if, for example X i = X j for i, j ∈ P , i = j). We use V n,P to denote the unique unbounded, path-connected component of the complement of U n,P . (The complement of U n,P has a unique unbounded, path-connected component because U n,P , a union of finitely many closed disks, is compact.) By construction, the boundaries ∂U n,P ⊇ ∂V n,P consist of arcs of the finitely many circles ∂B 1 , . . . , ∂B l .
Lemma 5.5. The curve γ n,P := ∂V n,P is a simple, closed curve (i.e. a Jordan curve), and U n,P is contained in the bounded component of C \ γ n,P .
Proof. There are several ways that one could proceed. One method is to construct a simple path starting on the boundary ∂V n,P that follows circle arcs until it returns to the start. A second approach is to consider the genus of the region U n,P , find generators for its fundamental group, and "close-off" any "holes." We present, in detail, a third method that relies on the following converse of the Jordan curve theorem due to Schönflies (see [8,35], and the discussion on pp. 13 and 67 of [37]). The theorem statement requires two definitions.
A region of the closed set F ⊂ C is defined as a path-connected component of C \ F . A point x in F is accessible from a region R if there is a point y ∈ R and a simple path from y to x, whose intersection with F is {x}.
Theorem 5.6 (Theorem 1 in [35]; see also Theorem II 5.38 on p. 67 of [37]). If F is a compact set in C with precisely two regions such that every point of F is accessible from each of those regions, then F is a simple closed curve.
Our goal is to show that the compact set γ n,P = ∂V n,P has precisely two regions from which γ n,P is accessible at every point. Define U ′ n,P := C \ V n,P . Observe that y U n,P V n,P (A) Case 1: y is on precisely one circle among {∂Bi} l i=1 . y (B) Case 2: y is on more than one of the circles {∂Bi} l i=1 . Figure 6. The geometry near y ∈ γ n,P .
C \ γ n,P = V n,P ∪ U ′ n,P , where the union is disjoint. It is clear that V n,P is a region of γ n,P ; next, we argue that U ′ n,P is also a region of γ n,P . Since U ′ n,P ⊂ C is open, it suffices to show that U ′ n,p is connected. Suppose, for a contradiction, that this is not the case. Then, there are disjoint, non-empty open sets S, T ⊂ C such that S ∪ T = U ′ n,P . By construction, the open set U n,P ⊂ U ′ n,P is path-connected, and hence connected, so U n,P must be completely contained in either S or T . Suppose, without loss of generality, that U n,P ⊂ S. Since T is non-empty, there is some x ∈ T . We will demonstrate that a path whose image is contained entirely in U ′ n,P connects x to a point of U n,P ⊂ S, which results in a contradiction. We may assume that x / ∈ ∂U n,P because otherwise x lies on a one of the circles ∂B i , 1 ≤ i ≤ l, and there is a path in U ′ n,P between x and a point of U n,P ⊂ S.
Since the (finitely many) circles ∂B 1 , . . . , ∂B l are distinct, there are only finitely many points of C that are contained in more than one circle. Consequently, we can choose a point v ∈ V n,P such that the line segment xv does not contain any points of C that lie in the intersection of two or more distinct B i , 1 ≤ i ≤ l. (Indeed, choose a circle C x ⊂ V n,P , centered at x, whose interior contains the compact set U ′ n,P . Then, the collection {xz : z ∈ C x } of line segments connecting x to points of C x is infinite in number. Also, x / ∈ ∂U n,P by assumption.) Define the path ℓ : [0, 1] → C via t → tx + (1 − t)v, whose image is the line segment xv. Since xv is connected, it cannot be the case that xv ∈ C \ γ n,P (indeed, U ′ n,P ∪ V n,P = C \ γ n,P is a disjoint union of non-empty open sets). Consequently, xv contains a point of γ n,P . Let t * := min {t : ℓ(t) ∈ γ n,P } and set y := ℓ(t * ). Note that t * > 0 since x / ∈ U n,P . By construction, y lies on precisely one of the circles {∂B i } l i=1 ; suppose, without loss of generality, that y ∈ ∂B 1 . Hence, we can choose an open ball B y ∋ y small enough that B y \ ∂B 1 consists of exactly two disjoint, path-connected open regions (See Figure 6A). One of these regions must be a subset of B 1 ⊂ U n,P , and the other must be a subset of V n,P . (The second region is connected and open, contains no points of ∂V n,P , and must contain a point of V n,P because y ∈ ∂V n,P .) Choose η > 0 small enough so that t * − η > 0 and ℓ(t * − η) ∈ B y . It follows that the line segment L := {ℓ(t) : 0 ≤ t ≤ t * − η} is connected and disjoint from γ n,P . We conclude that L is contained entirely in T , for it contains x ∈ T . This means L does not contain any points of V n,P , so ℓ(t * − η) ∈ B y ∩ B 1 ⊂ U n,P ⊂ S. We have reached a contradiction since S and T are disjoint, so U ′ n,P must be connected. We have shown that γ n,P has precisely two regions, V n,P and U ′ n,P . It remains to show that every point of γ n,P is accessible from both of these regions. Suppose y ∈ γ n,P . There are two cases: y is contained in precisely one of ∂B i , 1 ≤ i ≤ l, or y is contained in more than one of these circles. (See Figures 6A and 6B, respectively.) If the first case is true, just as we did above, we can choose an open ball B y ∋ y small enough that B y \∂B 1 consists of the two disjoint, path-connected open regions B y ∩ U n,P and B y ∩ V n,P . It is now clear that y is accessible from both V n,P and U ′ n,P ⊃ U n,P . On the other hand, suppose, without loss of generality, that y is contained in the circles ∂B 1 , ∂B 2 , . . . , ∂B j . Then, we can choose an open ball B y ∋ y small enough that B y \ j i=1 ∂B i consists of 2j disjoint path-connected, open regions that do not contain points from γ n,P (see Figure 6B). Consequently, each of these regions must be entirely contained in one of the disjoint open sets U ′ n,P or V n,P . Since y ∈ ∂U ′ n,P = ∂V n,P , at least one of the 2j regions must be contained in U ′ n,P and at least one must be contained in V n,P . It follows that y is accessible from both V n,P and U ′ n,P . We conclude via Theorem 5.6 that γ n,P is a simple closed curve whose interior contains U n,P because U ′ n,P is the bounded component of C \ γ n,P , and U n,P ⊂ U ′ n,P .
We have shown that there are simple, closed curves {γ n,P } P ∈Pn so that for each P ∈ P n , γ n,P ⊆ ∂U n,P and U n,P is contained in the interior of the bounded region defined by γ n,P . Furthermore, the path-connected, open regions {U n,P } P ∈Pn are disjoint by the definition of the equivalence relation ∼. This means that no curve γ n,P can pass through the interior of any region U n,P , and as a result, we can identify "maximal" curves which we will use in the remainder of the proof.
Definition 5.7. We say that a simple, closed curve γ n,P * among {γ n,P } P ∈Pn is maximal if whenever U n,P * is in the bounded component of C \ γ n,P for some P ∈ P n , we have P = P * . We use M n to denote the collection of maximal curves. For each Γ ∈ M n , let O Γ denote the bounded component of C\Γ, so that ∂O Γ = Γ.
Notice that the domains O Γ , Γ ∈ M n are disjoint by construction and that each X j , 1 ≤ j ≤ n, is contained in precisely one O Γ . We conclude this subsection with two important lemmas that restrict the sizes of the equivalence classes P , P ∈ P n and domains O Γ , Γ ∈ M n . Lemma 5.8. Suppose 0 < δ < 1/3. There exists C δ > 0 so that for n ≥ C δ , the following holds on the complement of G δ n : for each P ∈ P n , |P | ≤ δ ln n + 2, and if x, y ∈ U n,P , then, Proof. Assume, for a contradiction, that there is a P ∈ P n for which |P | > δ ln n+2, and suppose, without loss of generality, that 1 ∈ P . By the definition of P n , for . . , B i li are balls with radius at most (ln n) −1 n −1/2 . Notice that the distance between X 1 and any X i , i ∈ P \ {1} is bounded by 2 + 2(l i − 1) times this maximum radius (recall that X 1 and X i , i ∈ P \ {1} are the centers of B i 0 and B i li , respectively). We consider two cases: (i) for every i ∈ P \ {1}, l i < δ ln n + 2 (ii) there is an i * ∈ P \ {1} for which l i * ≥ δ ln n + 2.
If case (i) is true, then, for n large enough to guarantee δ ln n ≥ 3, so every X i , i ∈ P is in the ball of radius n −1/2 centered at X 1 , which is impossible on the complement of G δ n . On the other hand, if case (ii) is true, then, for large n, are overlapping balls with radius at most (ln n) −1 n −1/2 , so if n is large enough that δ ln n ≥ 7 and y ∈ ⌈δ ln n+2⌉ k=0 This is impossible on the complement of G δ n because it would imply too many roots among {X j } n j=1 in the ball of radius n −1/2 centered at X 1 . Now, suppose x, y ∈ U n,P and n is large enough to guarantee that, on the complement of G δ n , |P | ≤ δ ln n + 2 and δ ln n > 4. Since the path-connected set U n,P consists of |P | overlapping closed disks of radius at most (ln n) −1 n −1/2 , we have Corollary 5.9. Suppose 0 < δ < 1/3. There exists C δ > 0 such that for n ≥ C δ , on the complement of G δ n , each Γ ∈ M n satisfies the following. There exist x * , y * ∈ Γ so that if x, y ∈ O Γ , then Proof. In view of Lemma 5.8, it suffices to show that there exist x * , y * ∈ Γ so that sup x,y∈OΓ (Recall that there exists P * ∈ P n so that Γ ⊂ ∂U n,P * .) Since O Γ is compact and (x, y) → |x − y| is continuous, the extreme value theorem guarantees the existence of x * , y * ∈ O Γ so that the supremum in (57) is achieved when x = x * and y = y * . Suppose, for a contradiction, that x * / ∈ Γ. Then, x * is in the open set O Γ , and there is a ρ > 0 so that x * ∈ B(x * , ρ) ⊂ O Γ . Consequently, the line segment x * y * can be extended along the line connecting x * and y * by length ρ/2 without leaving O Γ . This contradicts the assumption that the supremum in (57) is achieved for x = x * , y = y * . We conclude that x * ∈ Γ. A similar argument shows that y * ∈ Γ, too.

5.5.
Pairing of roots and critical points inside each domain. We now show that on the complement of the "bad" events, the roots and critical points within most of the domains O Γ , Γ ∈ M n are "paired." The only domains for which this does not occur are those that contain roots of p n (z) that are "too close" to the zeros of m µ . (See Figure 5 for reference; recall that m µ (z) = 0 precisely when z = 0 in the case where µ is the uniform measure on the unit disk.) To make "too close" rigorous, we define the random collection of roots R pair n := X j : 1 ≤ j ≤ n and X j ∈ C \ A ⌊4 ln(ln n)⌋ n ∪ A ⌊4 ln(ln n)⌋+1 n ⊆ X j : 1 ≤ j ≤ n and |m µ (X j )| > (ln n) 4 √ n .
The following lemma is the main result of this subsection.
Lemma 5.10. For a fixed δ > 0 chosen sufficiently small, there is a constant C δ > 0 so that for n ≥ C δ , on the complement of ∪ n i=1 F i n ∪ G δ n ∪ H n , the following conclusion holds. For each O Γ , Γ ∈ M n , such that O Γ ∩ R pair n = ∅, the number of critical points of p n (z) that lie inside O Γ is equal to the number of roots of p n (z) that lie inside O Γ (where both counts include multiplicity). Furthermore, if X ∈ O Γ ∩ R pair n and w ∈ O Γ is a critical point of p n (z), then, Proof. The proof of this lemma is similar in flavor to the proofs of Theorems 2.9 and 2.12, although the argument presented here is much more technical. Fix n ∈ N, suppose O Γ , Γ ∈ M n is such that O Γ ∩ R pair n = ∅, and choose an X ∈ O Γ ∩ R pair n to be a distinguished root that will be a reference point in our calculations. We classify the roots {X j } n j=1 into three groups based on their proximity to X (see Figure 7). To that end, define and let q X (z) := j / ∈Rnear (z − X j ) and r X (z) := j∈Rnear (z − X j ), so that p n (z) = q X (z)r X (z). Note that |R med | and |R near | are of size at most δ ln n + 2 on the complement of G δ n . We will compare the zeros of p ′ n (n) inside O Γ to the zeros of the function The idea is that similar to the logarithmic derivative of p n (z) for z near X. Furthermore, the number of roots of the equation that are inside O Γ will be easy to calculate since these are the same as the critical points of p X (z) := r X (z) · (z − Y X ) n−|Rnear| that lie inside O Γ (we will show that Y X / ∈ O Γ ), and these can be located with Walsh's two circle theorem.
The following lemma contains a few facts that we will frequently reference for the remainder of the proof of Lemma 5.10.
Lemma 5.11. Suppose δ < 1/3. There is a constant K µ,δ ∈ N, depending only on µ and δ (and not on X, P, Γ, etc...), so that n ≥ K µ,δ implies the following. On Proof. Much of this proof relies on the fact that m µ (·) is nearly Lipschitz (see Lemma 3.3 part (ii)). To establish (i), we first observe that for large n, on the Indeed, via Corollary 5.9, |ξ − X| < 3δ √ n < 1 √ n for large n, on the complement of G δ n , so as long as we also have 1 √ n < min ε µ , e −1 , Lemma 3.3 guarantees that .
(We have used the fact that on the interval [0, e −1 ], the function −x ln x is increasing.) It follows that for n ≥ 5 and larger than some constant depending on µ and δ, on the complement of G δ n , which implies equation (58). (The last inequality follows since X ∈ R pair n .) We will use this inequality to compute |z − X|, for z ∈ O Γ , in a way that references the balls that we started with when we constructed Γ.
Let n be large enough to establish (58) and the conclusion of Corollary 5.9 on the complement of G δ n . Since, z, X ∈ O Γ , Corollary 5.9 guarantees the existence of w 1 , w 2 ∈ Γ for which |z − X| ≤ |w 1 − w 2 |. Recall that Γ ⊆ ∂U n,P * for some P * ∈ P n , so there are i 1 , i 2 ∈ P * ⊂ O Γ , for which Furthermore, since i 1 and i 2 are related by the equivalence that defines P n , there are open balls B 0 , B 1 , . . . B l ∈ C n , of the form and B k ∩B k+1 = ∅ for 0 ≤ k ≤ l−1. Notice that on the complement of G δ n , equation (58) guarantees that the radii of these balls are bounded by 2(ln n) 3 n|mµ(X)| (recall that X ∈ R pair n ), and if n is large enough to guarantee the conclusion of Lemma 5.8, the number of balls, l, is less than |P * | ≤ δ ln n + 2. It follows that for n larger than a constant depending on δ, on the complement of G δ n , We have established the first half of (i). To see the second inequality, simply recall that Γ does not pass through U n,P for any P ∈ P n , so if z ∈ Γ, then √ n for any root X j , 1 ≤ j ≤ n. In particular, this is true for X ∈ R pair n , which satisfies |m µ (X)| ≥ (ln n) 4 √ n , so we obtain the second part of (i). Inequality (ii) holds for large n on the complement of where l is any index different from i. Since the X j are iid, we have so equation (59) implies that for any i, 1 ≤ i ≤ n, Now, X = X iX for some i X , 1 ≤ i X ≤ n, and X ∈ R pair n , so on the complement of On the complement of G δ n , |R near | is at most δ ln n + 2, so for large n, on the complement of ∪ n i=1 F i n ∪ G δ n inequality (60) establishes the upper bound in (ii). (We have used that X ∈ R pair n to bound 2πCµ √ n(ln n) 2 above by, say, 1/4 |m µ (X)| for large n.) The lower bound in (ii) is achieved similarly by using the reverse triangle inequality to obtain in place of (59).
We conclude by establishing (iii) as a consequence of (i) and (ii). Indeed, via the triangle inequality, we have for large n, on the complement of ∪ n i=1 F i n ∪ G δ n , that where the rightmost inequality holds for large n. The lower bound in (iii) follows for similar reasons, and f X is analytic because |m µ (X)| is almost surely bounded above by an constant that depends only on µ (apply Lemma 3.3, part (i) with ξ = 0 and ρ = +∞).
The next Lemma justifies our choice of f X (z) as an intermediate comparison between p n (z) and p ′ n (z) because it establishes that under the right conditions, f X (z) and p n (z) have the same number of roots in the domain O Γ . Consider Figure 7 which provides a visual aid to the argument.
has |R near | critical points inside B X, 5(ln n) 2 n|mµ(X)| ⊂ O Γ , and none of these is Y X / ∈ O Γ . In particular, under these conditions, f X (z) has the same number of roots inside O Γ as p n (z) does.
Proof. This follows from Walsh's two circle theorem (see e.g. Theorem 4.1.1 in [27].) First, we'll show that r X (z) and p ′ X (z) have the same number of roots, Figure 7. A diagram to illustrate Lemma 5.12 and its proof. The red dots and blue crosses are meant to represent roots and critical points, respectively, of p n that lie in a region near X, which is denoted by a green star. The large dashed circle is intended to be on the order of n −1/2 . Note that indices 1 ≤ j ≤ n in R near correspond to roots X j that lie interior to C 1 . This figure is neither to scale nor the result of a simulation.
|R near |, inside O Γ by using Walsh's two circle theorem, and then, we'll use this fact to compare the roots of p n (z) and f X (z) inside O Γ .
To that end, choose n large enough so that the statements in Lemma 5.11 hold on the complement of ∪ n i=1 F i n ∪ G δ n , and define the circular domains C 1 := B X, (ln n) 2 n |m µ (X)| and C 2 := B Y X , (ln n) 2 n |m µ (X)| .
Note that C 1 and C 2 are disjoint for large n on the complement of ∪ n i=1 F i n ∪ G δ n by inequality (iii) of Lemma 5.11: In fact, for n large enough, so on the complement of ∪ n i=1 F i n ∪ G δ n , Lemma 5.11 part (i) guarantees that C 2 is disjoint from O Γ .
Next, observe that all of the roots of p X (z) lie in C 1 ∪ C 2 , so by Walsh's two circle theorem, the critical points of p X lie in C 1 ∪ C 2 ∪ C, where C is the open ball By Lemma 5.11, for large n, on the complement of where the last inequality holds for large n. It follows that for large n, on the (recall X ∈ R pair n ), so in particular, C ∪ C 1 is contained in O Γ , and this union is disjoint from C 2 . Consequently, by the Supplement Theorem 4.1.1 in [27], for large n, on the complement of ∪ n i=1 F i n ∪ G δ n , p ′ X (z) has |R near | roots inside O Γ , just like r X (z) does. Under these conditions, f X (z) has the same roots as q X (z) p ′ X (z) inside O Γ because Y X / ∈ O Γ , so it follows that f X (z) and p n (z) = q X (z)r X (z) have the same number of roots inside O Γ .
We conclude this subsection with two lemmas and an application of Rouché's theorem to establish that f X (z) and p ′ n (z) have the same numbers of zeros in O Γ . This will imply via Lemma 5.12 that p n (z) and p ′ n (z) have the same numbers of zeros in O Γ . Lemma 5.13. Suppose δ < 1/8. There exist positive constants C µ , dependent only on µ, and C µ,δ , dependent only on µ and δ (and not on X, Γ, etc...), so that for n ≥ C µ,δ , on the complement of (here, C µ is independent of δ).
Consequently, for n large and z ∈ Γ, on the complement of The last lemma in this subsection establishes a lower bound on |f X (z)| that will combine with (61) to fulfill the hypotheses of Rouché's theorem on the boundary Γ of the domain O Γ .
Lemma 5.14. For fixed δ > 0, there is a constantČ µ,δ depending only on µ and δ so that when n ≥Č µ,δ , on the complement of ∪ n i=1 F i n ∪ G δ n , if z ∈ Γ, |f X (z)| ≥ |p n (z)| n |m µ (X)| · e −9 . (66) Proof. We have By Lemma 5.12, for large n, on the complement of ∪ n i=1 F i n ∪ G δ n , the polynomial expression has degree |R near |, leading coefficient n, and |R near | roots in B X, 5(ln n) 2 n|mµ(X)| ⊂ O Γ . It follows that under these conditions, where the critical points of p X (z) that index the product are considered with multiplicity. If, additionally, δ < 1, and n is large enough to guarantee the bounds on |z − X| in Lemma 5.11, we have that on the complement of ∪ n i=1 F i n ∪ G δ n and for z ∈ Γ, (ln n) 2 n |m µ (X)| ≤ |z − X| δ ln n .
Hence, if n is large enough, on the complement of ∪ n i=1 F i n ∪ G δ n , for z ∈ Γ, We have used Lemma 5.11 to bound |z − Y X |, and the last inequality holds for large n and comes from the fact that (Note that the rate of convergence possibly depends on δ.) We have achieved (66) as was desired.
We have now established both (61) and (66), where the inequalities are independent of X, Γ, and z ∈ Γ. Since C µ is independent of δ, we can choose δ ∈ (0, 1/8) small enough that C µ δ 2 < e −9 . For such a δ, by Lemmas 5.13 and 5.14, for large n, on the complement of ∪ n i=1 F i n ∪ G δ n ∪ H n , any z ∈ Γ satisfies |p ′ n (z) − f X (z)| < |f X (z)| .
It follows by Rouché's theorem that for large n, on the complement of ∪ n i=1 F i n ∪G δ n ∪ H n , p ′ n (z) and f X (z) have the same number of zeros inside O Γ , and by Lemma 5.12, we conclude that p ′ n (z) and p n (z) have the same number of zeros inside O Γ . The inequality in the conclusion of Lemma 5.10 follows directly from this and Lemma 5.11 part (i) (note δ ≤ 1/4).
In the argument above, the particular curve Γ ∈ M n and the root X ∈ O Γ ∩R pair n were arbitrary, and all of the constants involved were independent of Γ, so we have proved Lemma 5.10. n−1 denote the (not necessarily distinct) critical points of p n (z), and recall the definitions of the empirical measures, µ n and µ ′ n (see (4) and (5)). Since the numbers of roots and critical points of a polynomial differ by one, we first compare the measure µ ′ n to the intermediate measurẽ The following lemma justifies our choice ofμ ′ n .
Lemma 5.15. Let µ ′ n ,μ ′ n , and η n := max 1≤j≤n |X j | be defined as above. Then, with probability 1, Proof. Let π be the measure on C × C given by j ,X) , whose marginal distributions are easily seen to be µ ′ n andμ ′ n . It follows from the definition of the L 1 -Wasserstein metric that, almost surely, where the last inequality follows from the Gauss-Lucas theorem.
The next result is an L 1 -Wasserstein comparison between µ n andμ ′ n that we will use in conjunction with Lemma 5.15 and the triangle inequality to prove Theorem 2.3.
Lemma 5. 16. Let X 1 , . . . , X n be iid, complex random variables with distribution µ that has a bounded density and satisfies Assumption 2.1. Then, there is a constant C, depending only on µ, so that with probability 1 − o(1), where µ n ,μ ′ n , and η n are defined as above.
We will now make a judicious choice of σ n in order to take advantage of the "clumping" behavior of the roots and critical points of p n (z) proclaimed in the conclusion of Lemma 5.10.
To start, define the index sets S Γ , Γ ∈ M n by For large n, on the complement of E bad n , Lemma 5.10 guarantees that each O Γ , Γ ∈ M n satisfying O Γ ∩ R pair n = ∅ contains the same numbers of critical points and roots of p n (z). Consequently, we can choose σ n so that for each Γ ∈ M Γ satisfying O Γ ∩ R pair n = ∅, we have (recall that O Γ , Γ ∈ M n are pairwise disjoint). For the remaining indices whose images under σ n we haven't specified, arbitrarily assign them from among the remaining choices. (There is at least one index 1 ≤ i ≤ n for which σ n (i) is still undefined because the number of roots and critical points of p n (z) differs by 1. Recall that we have added w (n) n = X to account for this fact.) Based on our construction of σ n , Lemma 5.10 also implies that for large n, on the complement of E bad n , for each j, 1 ≤ j ≤ n, such that X j ∈ R pair n . (Indeed, X j ∈ R pair n implies that X j ∈ O Γ for some Γ ∈ M n .) By the Gauss-Lucas theorem, each critical point (and each root) of p n (z) is in the convex hull of the set {X j } n j=1 of roots of p n (z). Consequently, for any X j / ∈ R pair n , we have the trivial bound σn(j) ≤ 2η n .
To complete the proof of Lemma 5.16, recall that P(E bad n ) = o(1), and observe that with probability at least 1 − o(1), η n ln n ≥ 1.
We conclude this subsection by remarking that Theorem 2.3 follows from Lemmas 5.1, 5.15, and 5.16 and the triangle inequality for the L 1 -Wasserstein metric. A.3. Proof of Theorem 2.14.
Proof. Conclusions (i) and (ii) follow from [31,Theorem 1.7]. We now use Theorem 2.12 to establish (18). In particular, we will verify the three conditions of Theorem 2.12 hold for some constants C 1 , C 2 , k Lip > 0 which depend only on ε and λ. In view of parts (i) and (ii), it suffices to work on the event where max 1≤i≤n−1 |X i | ≤ 1 + 2ε, min 1≤i≤n−1 |ξ − X i | ≥ ε 2 , 1 + 11 4 ε ≤ |ξ| ≤ |λ| + 1. (67) In fact, this event automatically guarantees the third condition from Theorem 2.12 for all values of n sufficiently large. The second condition also follows for large n since, for z, w ∈ C with |z|, |w| > 1 + 5/2ε, we have on the same event. The upper bound in the first condition of Theorem 2.12 follows from a similar argument. The lower bound, however, is slightly more involved. Indeed, for any θ ∈ R, we have 1 n Choose θ ∈ R so that ξe √ −1θ is real-valued and positive. This gives 1 n Thus, on the event (67), we conclude that 1 n which completes the proof of the lower bound. Hence, the three conditions of Theorem 2.12 are satisfied. Applying Theorem 2.12, we obtain (18). Lastly, (19) follows from (18) after applying conclusion (ii) and (68).
Proof. Fix a > 0, and let b > 0 be a large constant (depending on C and a) to be chosen later. Since Z is independent of X 1 , . . . , X n it follows that, with probability 1, Z ∈ {X 1 , . . . , X n }. Hence the sum n i=1 1 Z − X i is well-defined and finite. By conditioning on the values of X 2 , . . . , X n and Z, it suffices to prove that sup w∈C sup z∈B(0,n C ) The claim now follows from Lemma A.1 below by taking ε := n −b and choosing b sufficiently large in terms of C and a.
Lemma A.1. Fix C > 0, and let X be a complex-valued random variable that is absolutely continuous (with respect to Lebesgue measure on C) and which has density bounded by n C . If E|X| ≤ n C , then for every a > 0 and 0 < ε < 1, sup w∈C sup z∈B(0,n C ) Proof. Fix w ∈ C and z ∈ B(0, n C ). We consider two cases. If |w| ≤ √ ε, then by Markov's inequality.
We now consider the case where |w| > √ ε. Define the event E := {|X| ≤ n C+a }.
By Markov's inequality, it follows that Thus, we obtain Combining the bounds above yields for any w ∈ C and z ∈ B(0, n C ). The proof of the lemma is complete.

Appendix B. A heavy-tailed CLT
In this subsection, we prove Theorem B.1, a CLT for "heavy-tailed" random variables that have the same distribution as Y := 1 ξ−X , where X ∼ µ, and µ has a continuous density f in a neighborhood of ξ. Notice that E |Y | p < ∞ for p ∈ [0, 2), but E |Y | 2 = ∞. Many results demonstrate that Y is in the domain of attraction of a normal random variable (see e.g. Section XVII.5 in [9], Theorem 11 in Section 6.4 of [10], and Theorem 3.10 in [26]), however, our implementation of Theorem B.1 requires specific information about the parameters of the limiting normal distribution; we include an explicit statement and proof for clarity.
Theorem B.1. Let X 1 , X 2 , . . . be iid, complex-valued random variables with common distribution µ, fix s, k ∈ N, and suppose ξ 1 , . . . , ξ s , t 1 , . . . , t k ∈ C are deterministic values with ξ 1 , . . . , ξ s distinct. In addition, assume that µ has a bounded density f in a neighborhood of each ξ l , 1 ≤ l ≤ s, that is continuous at these points. Then, 1 √ n ln n n j=1 s k=1 t k 1 ξ k − X j − m µ (ξ) −→ N in distribution as n → ∞, where N is a complex random variable with mean zero whose real and imaginary parts have a joint Gaussian distribution that has covariance matrix (Here, I denotes the 2 × 2 identity matrix.) Proof. We proceed by Lindeberg's exchange method [20]. (See also [4]. Similar methods have been applied to problems in random matrix theory; see e.g. [32], [33].) To that end, let N, N 1 , N 2 , . . . be a sequence of iid complex random variables independent of {X j }, whose components have a joint Gaussian distribution with mean zero and covariance matrix Σ, defined in (69), and let g : C → R be a smooth test function with compact support. We will show that as n → ∞, which implies convergence of the corresponding measures in the vague topology. Convergence in distribution follows because for each n, n −1/2 n j=1 N j has the same distribution as the random variable N . (See e.g. Exercise 1.1.25 of [30], pages 23-33.) Since the random variables s k=1 t k ξ k −Xj are heavy-tailed, we initially need to truncate them. Let ε ∈ (0, 1) be fixed, and define (Be aware that this notation suppresses the dependence of ζ j and ζ j on ε and n.) We will first establish Lemma B.2. There is a constant C µ,s, t > 0, depending only on µ, s, and t 1 , . . . , t k , and there is a natural number K µ,g,ε so that n ≥ K µ,g,ε implies Proof. By Taylor's theorem applied to the Taylor series for g centered at ζ j   ≤ 8 · C g 2! · (n ln n) 3/2 ζ 1 3 , and C g is any constant that is an upper bound for the mixed partial derivatives of g up to and including order three (which are compactly supported and thus bounded). Taking the expectation of both sides yields (by independence and the fact that ζ j are centered) The difference between these two equations is bounded by If we continue, for 2 ≤ k ≤ n, the process of computing the second order Taylor polynomials of g centered at above, ≤ C s, t,δ + s k=1 (f (ξ k ) + η) |t k | 2 2π 0 δ/|t k | 1/(|t k |ε √ n ln n) r 2 cos 2 θ r 4 r dr dθ ≤ C s, t,δ + s k=1 π(f (ξ k ) + η) |t k | 2 ln(δε √ n ln n).