More on the Structure of Extreme Level Sets in Branching Brownian Motion

This work is a continuation of the manuscript"the structure of extreme level sets in branching Brownian motion", in which the same authors studied the fine structure of the extreme level sets of branching Brownian motion, namely the sets of particles whose height is within a finite distance from the global maximum. It is well known that such particles congregate at large times in clusters of order-one genealogical diameter around local maxima which form a Cox process in the limit. Our main finding here is that most of the particles in an extreme level set come from only a small fraction of the clusters, which are atypically large.


Setup and state of the art
This work is a continuation of [5], in which the fine structure of the extreme values of branching Brownian motion (BBM) was studied. Let us first recall the definition of BBM and some of the state-of-the-art concerning its extreme value statistics. Let L t be the set of particles alive at time t ≥ 0 in a continuous time Galton-Watson process with binary branching at rate 1. The entire genealogy can be recorded via the metric space (T, d), consisting of the elements T := ∪ t≥0 L t and equipped with the genealogical distance, d(x, x ) := inf (t−s)+(t −s) 2 : s ≥ 0, x, x share a common ancestor in L s ; x ∈ L t , x ∈ L t , for any t, t ≥ 0. Conditional on (T, d), let h = (h(x) : x ∈ T ) be a mean-zero Gaussian process with covariance function given by Eh(x)h(x ) = (t + t )/2 − d(x, x ) for x ∈ L t and x ∈ L t and t, t ≥ 0. Equivalently, Eh(x)h(x ) is equal to the largest s ≥ 0 such that both x and x share a common ancestor at time s. Then the triplet (h, T, d) (or just h for short) forms a standard BBM and h(x) for x ∈ L t is interpreted as the height of particle x at time t. The restriction of T to all particles born up-to time t, will be denoted by T t := ∪ s≤t L s , with d t and h t the corresponding restrictions of d and h, respectively. The natural filtration of the process (F t : t ≥ 0) can then be defined via F t = σ(h t , T t , d t ) for all t ≥ 0.
The study of extreme values of h dates back to works of Ikeda et al. [8,9,10], McKean [12], Bramson [3,4] and Lalley and Sellke [11] who derived asymptotics for the law of the maximal height h * t = max x∈Lt h t (x). Introducing the centering function where log + t := log(t ∨ 1) , (1.1) and writing h t for the centered process h t − m t and h * t := h * t − m t for its maximum, these works show that h * t converges in law to G + (1/ √ 2) log Z as t → ∞, where G is a Gumbel distributed random variable and Z, which is independent of G, is the almost sure limit as t → ∞ of (a multiple of) the so-called derivative martingale: for some properly chosen C > 0.
Other extreme values of h can be studied simultaneously by considering the extremal process associated with it. To describe the latter, given t ≥ 0, x ∈ L t and r > 0, we let C t,r (x) denote the cluster of relative heights of particles in L t , which are at genealogical distance at most r from x. This is defined formally as the point measure, Fixing any positive function t → r t such that both r t and t − r t tend to ∞ as t → ∞ and letting L * t = x ∈ L t : h t (x) ≥ h t (y) , ∀y ∈ B rt (x) , the structured extremal process is then given as That is, E t is a point process on R × M, where M denote the space of all point measures on (−∞, 0], which records the-centered-height-of (u) and the-cluster-around (C) -all r t -local maxima of h. We shall sometimes refer to the pair (u, C) as a cluster-pair. It was then shown in [1,2] that Above Z t and Z are as before and ν is a deterministic distribution on M, which we will call the cluster distribution. As two consequences, one gets the convergence of the standard extremal process of h: as well as the convergence of the extremal process of local maxima: Henceforth, we shall use the unified notation E (t) (also E * (t) , E (t) ) to mean E or E t . The asymptotic growth of the number of extreme values which are also r t -local maxima can then be read off directly from (1.7). Indeed, a simple application of the weak law of large numbers combined with the convergence statement in (1.7) yield, The asymptotic growth of the number of all extreme values, which is arguably the more interesting quantity, is not however a straightforward consequence of (1.6). This is because the limiting process E is now a superposition of i.i.d. clusters C, and the law ν of the latter will determine the number of points inside any given set in the overall process. To address this question, a study of the cluster law ν was carried out in [5] and then used to show (Proposition 1.5) that for some C > 0. This was then combined with (1.5) and (1.6) to derive (Theorem 1.1 in [5]) The above shows that points coming from the clusters around extreme local maxima account for an additional multiplicative linear prefactor in the overall growth rate of extreme values. Next, it is natural to ask what are the typical height u ∈ R and typical cluster configuration C of those cluster-pairs (u, C) ∈ E (t) which carry the level set E (t) | [−v,∞) . The question of the typical height was addressed in [5]. To give a precise statement, for a Borel set B ⊆ R × M let us define (compare with (1.6)): (1.11) These processes record all extreme values coming from cluster pairs (u, C) in B. Theorem 1.2 from [5] then showed that for any α ∈ (0, 1], as v → ∞ In other words, the typical height u of those extreme local maxima of which clusters contribute to This left the open question of the typical cluster configurations carrying ∞) , and this is precisely the focus of this manuscript.

New results
Our first observation is that C([−v, 0]) is not concentrated around its mean. This could be readily concluded from the results in [5], by comparing the first and second moment of this quantity (see also Lemma 2.7 below). The next proposition provides a quantitative version for this assertion. (1.13) Moreover, there exists C > 0 such that for all δ > 0 and v > 0, (1.14) The above proposition shows that the asymptotic mean on the right hand side of (1.9) is the result of an unlikely event of probability O(v −1 ), in which the number of cluster points above −v is of unusually high order ve √ 2v . Let us call a cluster satisfying the event in (1.14) a δ-fat cluster for the height −v, or a (−v, δ)-fat cluster, for short.
Since E (t) is a superposition of many i.i.d. clusters C, the above proposition should translate, by virtue of the law of large numbers, to the assertion that most extreme values come from only a small fraction of the clusters -the fat ones. To phrase this assertion in formal terms, if v ≥ 0 and δ > 0, we set (1.15) The cluster C in the pair (u, C) ∈ F δ (v) is fat precisely for the height that is relevant for its contribution to E| [v,∞) . Given a Borel set B ⊆ R × M we also define in analog to (1.11), Then, Theorem 1.2. For all > 0 there exist δ > 0, such that for all α ∈ (0, 1) with probability tending to 1 as v → ∞, Moreover, there exists C > 0 such that for all δ > 0 and α ∈ (0, 1), with probability tending to Both statements hold in the limit when t → ∞ followed by v → ∞ if we replace E and E * by E t and E * t respectively. As a corollary we get, Corollary 1.3. Let > 0 be arbitrarily small. Then for all v large enough, we may find The same holds for all t large enough (depending on v and ), if we replace E and E * by E t and E * t respectively. Figure 1: The cluster C t,r (X t ) around the spine X t , conditioned to be the maximum and at height m t . The process W s is a Brownian bridge from (0, m t ) to (t, 0) and σ 1 , σ 2 , . . . are the branching times. Figure 2: The contribution to the cluster level

Proof idea and heuristic picture
The proof of Theorem 1.2 is a rather standard application of Chebyshev's inequality, using Proposition 1.1 (along with a second moment bound on C([−v, 0]) from [5]) and the explicit (conditional) Poissonian structure of E. We therefore omit further explanations of this proof and the straightforward one for Corollary 1.3, focusing instead on the argument for Proposition 1.1. As in [5], the key ingredient is a handle on the cluster distribution ν. This was carried out in Section 3 of [5] and is presented in this manuscript again in Section 2. Let us therefore first briefly describe this derivation and recall how it was used in [5] to show (1.9). The reader is referred to [5] for more details. Thanks to (1.5) and the product structure of the intensity measure in the definition of E, the law ν can be obtained as the limiting distribution of the cluster around a uniformly chosen particle X t in L t , conditioned to be the global maximum at time t and having height, say, m t . Tracing the trajectory of this distinguished particle backwards in time and accounting, via the spinal decomposition (Many-to-one Lemma, see Subsection 2.1), for the random genealogical structure, one sees a particle performing a standard Brownian motion W = (W s ) s≥0 from m t at time 0 to 0 at time t. This, so-called, spine particle gives birth at random Poissonian times (at an accelerated rate 2, see Subsection 2.1) to independent standard branching Brownian motions, which then evolve back to time 0 and are conditioned to have their particles stay below m t at this time. The cluster distribution at genealogical distance r around X t is therefore determined by the relative heights of particles of those branching Brownian motions which branched off before time r (see Figure 1).
Formally, denoting by 0 ≤ σ 1 < σ 2 < . . . the points of a Poisson point process N on R + with rate 2 and letting H = (h s t (x) : t ≥ 0 , x ∈ L s t ) s≥0 be a collection of independent branching Brownian motions (with W, N and H independent), the cluster distribution ν can then be written as the weak limit (Lemma 2.2): where r t is as in (1.4) and h * t is as above (1.1). Tilting by s → −m t (1 − s/t) and setting where E s t is the extremal process associated with h s t and h s t = h s t − m t . In this paper we refer to the triplet ( W , N , H) as a decorated random-walk-like (DRW) process (see Subsection 2.2). Now let C ∼ ν and pick v > 0. Provided that we can exchange limit and integration, it follows form (1.21) and Palm calculus that (1.22) where the inner integral is over z = O(1) and is the result of the total probability formula, after conditioning on { W t,s = z}.
The left most term in the integrand is the first moment of the size of the (global) extreme level set of h s s , subject to a truncation event restricting the height of its global maximum. This was estimated in [5] The remaining terms are of the form It was shown in [5] (Subsection 2.1; see also Lemma 2.5 here), that similar estimates hold for (1.24) as well. (This is because the drift function γ t,s is bounded by 1 + log + (s ∧ (t − s)), the random decorations (h s * s : s ≥ 0) are (at least) exponentially tight and the random sampling 6 times (σ k : k ≥ 1) arrive at a Poissonian rate.) We can therefore estimate the probability in the denominator in (1.22) by Ct −1 and the probability in the numerator by (1.26) Using also (1.23) and performing the integrating over z = O(1) in (1.22), we obtain, But when they occur, it follows from (1.23) with s ∈ [ηv 2 , η −1 v 2 ] (and an additional concentra- Reversing . Such a cluster is realized at a large time t when its local maximum X t has ascended (atypically slowly) Figure 2).

Remainder of the paper
Section 2 makes rigorous the reduction of the study of ν to that of a DRW process conditioned to stay negative, as just described. It also provides the necessary probability estimates for the latter as well as needed moment bounds for C([−v, 0]. Section 3 then uses these preliminaries to prove the main results in the manuscript. Constants are denoted C, C , etc. They are positive, finite and may change from line to line.

A handle on the cluster distribution
As mentioned in the introduction, understanding the cluster distribution ν is key to proving the statements in the manuscript. As in [5], a handle on this distribution is obtained by first identifying ν with the law of the cluster around a distinguished spine particle, conditioned to be the global maximum of the process. Then, by tracing the trajectory of the spine particle backwards in time, the events involved can be recast in terms of a decorated random-walk-like process conditioned to stay negative. The two reduction steps are summarized in the next two subsections. Estimates for such random walks are given in the succeeding subsection. Finally, some upper bounds from [5], derived using the statements in the first three subsections, are stated in the last subsection. For all proofs see [5].

Reduction to the cluster of the spine, conditioned to be the maximum
We begin by recalling the useful technique of spinal decomposition (c.f., [7]). The (one) spine branching Brownian motion (SBBM) is defined as the original process (h, T, d), only that at any given time one of the particles is designated as the spine particle. Particles which are not the spine branch and diffuse exactly as before. The spine particle also diffuses as before, but branches (by two) at rate 2 not 1. When the spine branches, one of its children, chosen uniformly at random, is designated the new spine. We shall use the same notation (T, h, d) for the SBBM process and distinguish this process from the original one by renaming the underlying probability measure to P (with E the corresponding expectation). The identity of the spine at time t ≥ 0 will be recorded via the random variable X t ∈ L t . The genealogical line of decent of the spine particle, namely the function t → X t , will be referred to as the spine of the process.
The following is known as the Many-To-One Lemma. To avoid integrability issues, we state it for bounded functions. Recall that F t is the sigma-algebra generated by h t , T t and d t (but not X t ).

Lemma 2.1 (Many-To-One Lemma). Let F = (F (x) :
x ∈ L t ) be a bounded F t -measurable real-valued random function on L t . Then, E x∈Lt F (x) = e t EF (X t ) . (2.1) Recalling (1.3), let now C * t,r := C t,rt (X t ) denote the cluster around the spine. Thanks to Lemma 2.1 and the convergence in (1.5), it is then not difficult to show, Lemma 2.2 (Lemma 5.1 in [5]). Let C ∼ ν be distributed according to the cluster law. Then for any ν-continuity set B ⊆ M,

Reduction to a decorated random-walk conditioned to stay negative
Next, we introduce the decorated random-walk-like process using-which, probabilities such as the ones on the right hand side of (2.2), can be handled. Let W = (W s : s ≥ 0) be a standard Brownian motion, whose initial position we leave free to be determined according to the conditional statements we make. For 0 ≤ s ≤ t, we fix Let us also define the collection H = h s : s ≥ 0 of independent copies of h, that we will assume to be independent of W as well. Finally, let N be a Poisson point process with intensity 2dx on R + , independent of H and W and denote by σ 1 < σ 2 < . . . its ordered atoms. The triplet ( W , N , H) forms what we shall call a decorated random-walk-like process (DRW). The underlying probability measure will still be denoted by P and the corresponding expectation by E.
The following reduction statements appear in Subsection 3.1 of [5]. We refer the reader to that reference for their straightforward proofs. Recall that B r (x) is the ball of radius r around x in the genealogical distance d, and that we write h t = h t − m t and h * t = max x∈Lt h t (x). For A ⊆ L t set also h * t (A) for max x∈A h t (x). The first lemma concerns the event that particles in a genealogical neighborhood of the spine (including the spine itself) stay below a given height.
In particular for all t ≥ 0 and v, w ∈ R, In a similar way, we can express the distribution of the cluster around the spine particle, given that it reaches height m t . For what follows E s t denotes the extremal process of h s t , defined as E t in (1.6) only with respect to h s t in place of h t . Then,

Estimates for a decorated random-walk conditioned to stay negative
Having reduced the analysis of the cluster distribution to the study of the DRW conditioned to stay positive, we need estimates for probabilities involving the latter. Such probabilities are treated in general in [6], which is the supplement material to [5]. Specialized to our setup here, they take the form of the two lemmas below. The first lemma was already included in [5] (see third part of Lemma 3.4).
Next we have the following sharp entropic-repulsion result. This was not needed in [5] and hence left out from that work. Nevertheless, it is an immediate consequence of Proposition 1.5 from the supplement material [6] of [5].
Proof. Recalling the definition of W t,u the desired statement is precisely that in Proposition 1.5 in [6] with x = y = 0. We just need to verify that the assumptions in that proposition hold. Assumptions (1.1), (1.2) and (1.3) from [6] (with λ = 2 and sufficiently small δ > 0) were already verified in the proof of Lemma 2.5 in [5]. It remains therefore to check that for all 0 < u < r < u < t and some δ > 0. To this end, a simple computation gives (2.8) Then the upper bound of Lemma 3.3 in [5], once with (u, r, 0) and once with (t, u − r, r) in place of (t, s, r), gives (2.7) with any δ ≤ 2 √ log 2/3.

Moment bounds for the number of cluster points
Next we state several moment bounds for the number of cluster points. These were derived in [5] using the statements in the last three subsections and some non-negligible work. We start with a second moment bound for C([−v, 0]). It is part of Proposition 1.5 in [5].
Lemma 2.7. Let C ∼ ν. Then there exists C > 0 such that for all v ≥ 0, We also abbreviate J t,v (s) ≡ J ≥0 t,v (s) and j t,v (s) ≡ j ≥0 t,v (s). The following is part of Lemma 5.5 in [5].

Proof of Proposition 1.1
The proof of Proposition 1.1 will be based on the following two lemmas.

Lemma 3.2.
For all η > 0 and M > 0, there exists δ > 0 such that, for all v large enough and then t large enough.
Let us first prove Proposition 1.1.
Proof of Proposition 1.1. The second statement in the proposition follows trivially from (1.9) and Markov's inequality. It therefore remains to show (1.13). Given t ≥ 0, v ≥ 0, M > 0 and η > 0, define the event Thanks to Lemma 3.1 and Lemma 3.2, for any > 0 there exist η > 0, M > 0 and δ > 0 such that for all v and then t large enough, Next let δ := δ and define also the event A : Now take t → ∞ in the last display. Then by Lemma 2.2 and the bounded convergence theorem, (1.13) follows for all large enough v such that ν-almost-surely the point −v is not charged by C. Removing this stochastic continuity restriction requires a standard argument, the kind of which was used in many of the proofs in [5] (e.g., the proof of Proposition 1.5). We therefore omit further details.
Proof of Lemma 3.1. Using Lemma 2.3 and Lemma 2.4 and abbreviating we can write the product of the expectation in (3. where J ≥M t,v (s) is as in (2.10). Then, by the Palm-Campbell Theorem, the right hand side above is equal to Using Lemma 2.8 we can upper bound the above integral by (3.7) Proof of Lemma 3.2. Thanks to Lemma 2.4 (ignoring the distribution of C t,r (X t )), the conditional probability in the statement of the lemma is equal to Invoking Lemma 2.6, then gives the desired upper bound with some δ > 0, depending on η, M and all v and then t large enough.

Proofs of Theorem 1.2 and Corollary 1.3
The proofs of Theorem 1.2 will be based on the following lemma.
Lemma 3.3. For any > 0, there exists δ > 0 such that for all α ∈ (0, 1), Moreover, there exists C > 0 such that for any δ > 0 and α ∈ (0, 1), Proof. Let α ∈ (0, 1). Given −∞ < −v < w < z ≤ ∞ and δ > 0, define and z). Therefore, setting z := √ log v and w := −αv, we can bound the numerator in (3.8) Since conditional on Z, the intensity measure governing the law of E is finite on [0, ∞) × M almost surely, we must have E([z, ∞) × M) = 0 for all large enough z. This shows that At the same time, for any > 0, thanks to the first part of Proposition 1.1, we may find δ > 0 such that In addition, thanks to Lemma 2.7, It follows by Chebyshev's inequality that almost surely. By the bounded convergence theorem, the above limit holds also for the unconditional probability. Together with (3.12) and the union bound, this shows that as v → ∞, Since the quantity on the left hand side in the probability above dominates the one in the numerator of (3.8), this gives (3.8) with 8 /C in place of . Turning to the second statement of the lemma, by the definition of E, for any δ > 0 and α ∈ (0, 1), the law of the first numerator in (3.9), conditional on Z, is Poisson with parameter given by (3.15) Therefore by the second part of Proposition 1.1, for some C > 0. Then by Chebyshev's inequality, conditional on Z, the probability of the complement of the event in (3.9) with C := 2C is at most which goes to 0 as v → ∞, for P-almost every Z. The same then also holds for the unconditional probability, thanks again to the bounded convergence theorem.
We can now prove Theorem 1.2.
Proof of Theorem 1.2. Given > 0, we use Lemma 3.3 to find δ > 0 such that for any α ∈ (0, 1) holds with probability tending to 1 as v → ∞. Dividing both the numerator and denominator of (1.17) by C Zαve √ 2v , we now observe that whenever the event in (3.8) and the first event in (3.18) hold, we must also have (1.17) with 4 /3 in place of . At the same time, observing that E * [−αv, ∞); F δ (−v) = E [−αv, ∞) × M ∩ F δ (−v) which follows by definition, we now divide both the numerator and denominator of (1.18) by Ze √ 2αv / √ 2. We then see that whenever the event in (3.9) and the second event in (3.18) hold, we must also have (1.18) with 4C/3 in place of C. Renaming and C, and using the union bound, we complete the proof of the theorem for E and E * .
To obtain the finite t analogs (with t → ∞), it is sufficient to argue that all random quantities on the left hand sides of (1.17) and (1.18) are the joint weak limits of their respective finite time analogs as t → ∞. This in turn follows from (1.5) and standard arguments, the kind of which were used in many of the proofs in [5] (e.g., the proof of Theorem 1.1). We therefore omit further details.
Proof of Corollary 1.3. We shall only show that statement for E * and E as the argument for the case involving E t and E * t is almost identical. Given > 0, we let δ be given by Theorem 1. with probability at least 1 − /2 whenever v is large enough. At the same time, the second part of Theorem 1.2 shows that with probability at least 1 − /2, again whenever v is large enough. A final application of the union bound then completes the proof.