Collision local time of transient random walks and intermediate phases in interacting stochastic systems

In a companion paper, a quenched large deviation principle (LDP) has been established for the empirical process of words obtained by cutting an i.i


Introduction and main results
In this paper, we derive variational representations for the radius of convergence of the moment generating functions of the collision local time of two independent copies of a symmetric and transient random walk, both starting at the origin and running in discrete or in continuous time, when the average is taken w.r.t. one, respectively, two random walks. These variational representations are subsequently used to establish the existence of an intermediate phase for the long-time behaviour of a class of interacting stochastic systems.
There are symmetric transient random walks for which (1.1) holds with α = 1. Examples are any transient random walk on Z in the domain of attraction of the symmetric stable law of index 1 on R, or any transient random walk on Z 2 in the domain of (non-normal) attraction of the normal law on R 2 . In this situation, the two threshold values in (1.3-1.4) agree.

Continuous time
Next, we turn the discrete-time random walks S and S into continuous-time random walks S = (S t ) t≥0 and S = ( S t ) t≥0 by allowing them to make steps at rate 1, keeping the same p(·, ·). Then the collision local time becomes For the analogous quantities z 1 and z 2 , we have the following. 3 Theorem 1.6. Assume (1.1). If p(·, ·) is strongly transient, then 1 < z 2 < z 1 ≤ ∞.
Remark 1.7. An upper bound similar to (1.16) holds for z 1 as well. It is straightforward to show that z 1 < ∞ and z 1 < ∞ as soon as p(·) has finite entropy.

Discussion
Our proofs of Theorems 1.3-1.6 will based on the variational representations in Theorem 1.1-1.2. Additional technical difficulties arise in the situation where the maximiser in (1.7) has infinite mean word length, which happens precisely when p(·, ·) is transient but not strongly transient. Random walks with zero mean and finite variance are transient for d ≥ 3 and strongly transient for d ≥ 5 (Spitzer [26], Section 1). Conjecture 1.8. The gaps in Theorems 1.3 and 1.6 are present also when p(·, ·) is transient but not strongly transient.
In a 2008 preprint by the authors (arXiv:0807.2611v1), the results in [6] and the present paper were announced, including Conjecture 1.8. Since then, partial progress has been made towards settling this conjecture. In Birkner and Sun [7], the gap in Theorem 1.3 is proved for simple random walk on Z d , d ≥ 4, and it is argued that the proof is in principle extendable to a symmetric random walk with finite variance. In Birkner and Sun [8], the gap in Theorem 1.6 is proved for a symmetric random walk on Z 3 with finite variance, while in Berger and Toninelli [1] the gap in Theorem 1.3 is proved for a symmetric random walk on Z 3 whose tails are bounded by a Gaussian.
The role of the variational representation for r 2 is not to identify its value, which is achieved in (1.15), but rather to allow for a comparison with r 1 , for which no explicit expression is available.
It is an open problem to prove (1.11-1.12) under mild regularity conditions on S. Note that the gaps in Theorems 1.3-1.6 do not require (1.10-1.12).

The gaps settle three conjectures
In this section we use Theorems 1.3 and 1.6 to prove the existence of an intermediate phase for three classes of interacting particle systems where the interaction is controlled by a symmetric and transient random walk transition kernel. 4

Coupled branching processes
A. Theorem 1.6 proves a conjecture put forward in Greven [17], [18]. Consider a spatial population model, defined as the Markov process (η t ) t≥0 , with η(t) = {η x (t) : x ∈ Z d } where η x (t) is the number of individuals at site x at time t, evolving as follows: (1) Each individual migrates at rate 1 according to a(·, ·).
(2) Each individual gives birth to a new individual at the same site at rate b.
(4) All individuals at the same site die simultaneously at rate pb.
Here, a(·, ·) is an irreducible random walk transition kernel on Z d × Z d , b ∈ (0, ∞) is a birth-death rate, p ∈ [0, 1] is a coupling parameter, while (1)-(4) occur independently at every x ∈ Z d . The case p = 0 corresponds to a critical branching random walk, for which the average number of individuals per site is preserved. The case p > 0 is challenging because the individuals descending from different ancestors are no longer independent.
A critical branching random walk satisfies the following dichotomy (where for simplicity we restrict to the case where a(·, ·) is symmetric): if the initial configuration η 0 is drawn from a shift-invariant and shift-ergodic probability distribution with a positive and finite mean, then η t as t → ∞ locally dies out ("extinction") when a(·, ·) is recurrent, but converges to a non-trivial equilibrium ("survival") when a(·, ·) is transient, both irrespective of the value of b. In the latter case, the equilibrium has the same mean as the initial distribution and has all moments finite.
For the coupled branching process with p > 0 there is a dichotomy too, but it is controlled by a subtle interplay of a(·, ·), b and p: extinction holds when a(·, ·) is recurrent, but also when a(·, ·) is transient and p is sufficiently large. Indeed, it is shown in Greven [18] that if a(·, ·) is transient, then there is a unique p * ∈ (0, 1] such that survival holds for p < p * and extinction holds for p > p * . Recall the critical values z 1 , z 2 introduced in Section 1. B. Theorem 1.3 corrects an error in Birkner [3], Theorem 6. Here, a system of individuals living on Z d is considered subject to migration and branching. Each individual independently migrates at rate 1 according to a transient random walk transition kernel a(·, ·), and branches at a rate that depends on the number of individuals present at the same location. It is argued that this system has an intermediate phase in which the numbers of individuals at different sites tend to an equilibrium with a finite first moment but an infinite second moment. The proof was, however, based on a wrong rate function. The rate function claimed in Birkner [3], Theorem 6, must be replaced by that in [6], Corollary 1.5, after which the intermediate phase persists, at least in the case where a(·, ·) satisfies (1.1) and is strongly transient. This also affects [3], Theorem 5, which uses [3], Theorem 6, to compute z 1 in Section 1.1 and finds an incorrect formula. Theorem 1.4 shows that this formula actually is an upper bound for z 1 .

Interacting diffusions
Theorem 1.6 proves a conjecture put forward in Greven and den Hollander [19]. Consider the system (X(t)) t≥0 , with X(t) = {X x (t) : x ∈ Z d }, of interacting diffusions taking values in [0, ∞) defined by the following collection of coupled stochastic differential equations: Here, a(·, ·) is an irreducible random walk transition kernel on Brownian motions on R. The initial condition is chosen such that X(0) is a shift-invariant and shift-ergodic random field with a positive and finite mean (the evolution preserves the mean). It was shown in [19], Theorems 1.4-1.6, that if a(·, ·) is symmetric and transient, then there exist 0 < b 2 ≤ b * such that the system in (1.21) locally dies out when b > b * , but converges to an equilibrium when 0 < b < b * , and this equilibrium has a finite second moment when 0 < b < b 2 and an infinite second moment when b 2 ≤ b < b * . It was conjectured in [19], Conjecture 1.8, that b * > b 2 . As explained in [19], Section 4.2, the gap in Theorem 1.6 settles this conjecture, at least when a(·, ·) satisfies (1.1) and is strongly transient, with (1.22)

Directed polymers in random environments
Theorem 1.3 disproves a conjecture put forward in Monthus and Garel [25]. Let a(·, ·) be a symmetric and irreducible random walk transition kernel on Z d × Z d , let S = (S k ) ∞ k=0 be the corresponding random walk, and let ξ = {ξ(x, n) : x ∈ Z d , n ∈ N} be i.i.d. R-valued non-degenerate random variables satisfying i.e., Z n (ξ) is the normalising constant in the probability distribution of the random walk S whose paths are reweighted by e n (ξ, S), which is referred to as the "polymer measure". The ξ(x, n)'s describe a random space-time medium with which S is interacting, with β playing the role of the interaction strength or inverse temperature. It is well known that Z = (Z n ) n∈N is a non-negative martingale with respect to the family of sigma-algebras F n : with the event {Z ∞ = 0} being ξ-trivial. One speaks of weak disorder if Z ∞ > 0 ξ-a.s. and of strong disorder otherwise. As shown in Comets and Yoshida [12], there is a unique critical value β * ∈ [0, ∞] such that weak disorder holds for 0 ≤ β < β * and strong disorder holds for β > β * . Moreover, in the weak disorder region the paths have a Gaussian scaling limit under the polymer measure, while this is not the case in the strong disorder region. In the strong disorder region the paths are confined to a narrow space-time tube.
Recall the critical values z 1 , z 2 defined in Section 1.1. Bolthausen [9] observed that where S and S are two independent random walks with transition kernel p(·, ·), and concluded that Z is L 2 -bounded if and only if β < β 2 with β 2 ∈ (0, ∞] the unique solution of and E[Z ∞ ] = Z 0 = 1 for an L 2 -bounded martingale, it follows that β < β 2 implies weak disorder, i.e., β * ≥ β 2 . By a stochastic representation of the size-biased law of Z n , it was shown in Birkner [4], Proposition 1, that in fact weak disorder holds if β < β 1 with β 1 ∈ (0, ∞] the unique solution of i.e., β * ≥ β 1 . Since β → λ(2β) − 2λ(β) is strictly increasing for any non-trivial law for the disorder satisfying (1.23), it follows from (1.28-1.29) and Theorem 1.3 that β 1 > β 2 when a(·, ·) satisfies (1.1) and is strongly transient and when ξ is such that β 2 < ∞. In that case the weak disorder region contains a subregion for which Z is not L 2 -bounded. This disproves a conjecture of Monthus and Garel [25], who argued that β 2 = β * . Camanes and Carmona [10] consider the same problem for simple random walk and specific choices of disorder. With the help of fractional moment estimates of Evans and Derrida [16], combined with numerical computation, they show that β * > β 2 for Gaussian disorder in d ≥ 5, for Binomial disorder with small mean in d ≥ 4, and for Poisson disorder with small mean in d ≥ 3.
Outline Theorems 1.1, 1.3 and 1.6 are proved in Section 3. The proofs need only assumption (1.1). Theorem 1.2 is proved in Section 4, Theorems 1.4 and 1.5 in Section 5. The proofs need both assumptions (1.1) and (1.10-1.12) In Section 2 we recall the LDP's in [6], which are needed for the proof of Theorems 1.1-1.2 and their counterparts for continuous-time random walk. This section recalls the minimum from [6] that is needed for the present paper. Only in Section 4 will we need some of the techniques that were used in [6].

Word sequences and annealed and quenched LDP
Notation. We recall the problem setting in [6]. Let E be a finite or countable set of letters. Let E = ∪ n∈N E n be the set of finite words drawn from E. Both E and E are Polish spaces under the discrete topology. Let P(E N ) and P( E N ) denote the set of probability measures on sequences drawn from E, respectively, E, equipped with the topology of weak convergence. Write θ and θ for the left-shift acting on E N , respectively, E N . Write P inv (E N ), P erg (E N ) and P inv ( E N ), P erg ( E N ) for the set of probability measures that are invariant and ergodic under θ, respectively, θ. For with law ρ having infinite support and satisfying the algebraic tail property (No regularity assumption is imposed on supp(ρ).) Assume that X and τ are independent and write P to denote their joint law. Cut words out of X according to τ , i.e., put (see Fig. 2) and let Then, under the law P, Y = (Y (i) ) i∈N is an i.i.d. sequence of words with marginal law q ρ,ν on E given by to an element of E N , and define Figure 1: Cutting words from a letter sequence according to a renewal process. the empirical process of N -tuples of words. The following large deviation principle (LDP) is standard (see e.g. Dembo and Zeitouni [14], Corollaries 6.5.15 and 6.5.17). Let ) is the sigma-algebra generated by the first N words, Q | F N is the restriction of Q to F N , and h( · | · ) denotes relative entropy.
Theorem 2.1. [Annealed LDP] The family of probability distributions P(R N ∈ · ), N ∈ N, satisfies the LDP on P inv ( E N ) with rate N and with rate function I ann : The rate function I ann is lower semi-continuous, has compact level sets, has a unique zero at Q = q ⊗N ρ,ν , and is affine.
Quenched LDP. To formulate the quenched analogue of Theorem 2.1, we need some further notation. Let κ : E N → E N denote the concatenation map that glues a sequence of words into a sequence of letters.
Think of Ψ Q as the shift-invariant version of the concatenation of Y under the law Q obtained after randomising the location of the origin. For tr ∈ N, let [·] tr : E → [ E] tr := ∪ tr n=1 E n denote the word length truncation map defined by and to a map from tr is an element of the set Then, for ν ⊗N -a.s. all X, the family of (regular) conditional probability distributions P(R N ∈ · | X), N ∈ N, satisfies the LDP on P inv ( E N ) with rate N and with deterministic rate function I que : P inv ( E N ) → [0, ∞] given by 14) The rate function I que is lower semi-continuous, has compact level sets, has a unique zero at Q = q ⊗N ρ,ν , and is affine. Moreover, it is equal to the lower semi-continuous extension of I fin from s. all X, the family P(R N ∈ · | X) satisfies the LDP with rate function I ann given by (2.7).
Note that the quenched rate function (2.14) equals the annealed rate function (2.7) plus an additional term that quantifies the deviation of Ψ Q from the reference law ν ⊗N on the letter sequence. This term is explicit when m Q < ∞, but requires a truncation approximation when m Q = ∞.
We close this section with the following observation. Let be the set of Q's for which the concatenation of words has the same statistical properties as the letter sequence X. Then, for Q ∈ P inv,fin ( E N ), we have (see [6], Equation (1.22)) The idea is to put the problem into the framework of (2.1-2.5) and then apply Theorem 2.2. To that end, we pick and choose where the latter being the Green function of S − S at the origin. Recalling (1.2), and writing we have where X = (X k ) k∈N denotes the sequence of increments of S. (The upper indices 1 and 2 indicate the number of random walks being averaged over.) The notation in (3.1-3.2) allows us to rewrite the first formula in (3.7) as Note that, since Z d carries the discrete topology, f is trivially continuous. Let R N ∈ P inv (( Z d ) N ) be the empirical process of words defined in (2.5), and π 1 R N ∈ P( Z d ) the projection of R N onto the first coordinate. Then we have where P is the joint law of X and τ (recall (2.2-2.3)). The second formula in (3.7) is obtained by averaging (3.10) over X: Next we note that f in (3.9) is bounded from above. Indeed, the Fourier representation of p n (x, y) reads 1], and it follows that max (3.14) Consequently, f ((x 1 , . . . , x n )) ≤ [2Ḡ(0) − 1] is bounded from above. Therefore, by applying the annealed LDP in Theorem 2.1 to (3.11), in combination with Varadhan's lemma (see Dembo and Zeitouni [14], Lemma 4.3.6), we get z 2 = 1 + exp[−r 2 ] with (3.6)). The second equality in (3.15) stems from the fact that, on the set of Q's with a given marginal π 1 Q = q, the function Q → I ann (Q) = H(Q | q ⊗N ρ,ν ) has a unique minimiser Q = q ⊗N (due to convexity of relative entropy). We will see in a moment that the inequality in (3.15) actually is an equality.
In order to carry out the second supremum in (3.15), we use the following.
Proof. This follows from a straightforward computation.
Inserting (3.16) into (3.15), we see that the suprema are uniquely attained at q = q * and Q = Q * = (q * ) ⊗N , and that r 2 ≤ log Z. From (3.9) and (3.12), we have where we use that v∈Z d p m (u + v)p(v) = p m+1 (u), u ∈ Z d , m ∈ N, and recall thatḠ(0) is the Green function at the origin associated with p 2 (·, ·). Hence q * is given by  19) where I que (Q) is given by (2.13-2.14). Without further assumptions, we are not able to reverse the inequality in (3.19). This point will be addressed in Section 4 and will require assumptions (1.10-1.12).

Proof of Theorem 1.3
To compare (3.19) with (3.15), we need the following lemma, the proof of which is deferred.
With the help of Lemma 3.2 we complete the proof of the existence of the gap as follows. Since log f is bounded from above, the function is upper semicontinuous. Therefore, by compactness of the level sets of I que (Q), the function in (3.20) achieves its maximum at some Q * * that satisfies If r 1 = r 2 , then Q * * = Q * , because the function has Q * as its unique maximiser. But I que (Q * ) > I ann (Q * ) by Lemma 3.2, and so we have a contradiction in (3.21), thus arriving at r 1 < r 2 .
In the remainder of this section we prove Lemma 3.2.
Proof. Note that and The latter formula shows that m Q * < ∞ if and only if p(·, ·) is strongly transient. We will show that the set defined in (2.15). This implies Ψ Q * = ν ⊗N (recall (2.16)), and hence H(Ψ Q * |ν ⊗N ) > 0, implying the claim because α ∈ (1, ∞) (recall (2.14)). In order to verify (3.26), we compute the first two marginals of Ψ Q * . Using the symmetry of p(·, ·), we have . There are many p(·, ·)'s for which (3.28) fails, and for these (3.26) holds. However, for simple random walk (3.28) does not fail, because a → p 2n−1 (a) is constant on the 2d neighbours of the origin, and so we have to look at the two-dimensional marginal.

Proof of Theorem 1.6
The proof follows the line of argument in Section 3.2. The analogues of (3.4-3.7) are 37) where the conditioning in the first expression in (3.36) is on the full continuous-time path S = ( S t ) t≥0 . Our task is to compute and show that r 1 < r 2 .
In order to do so, we write S t = X Jt , where X is the discrete-time random walk with transition kernel p(·, ·) and (J t ) t≥0 is the rate-1 Poisson process on [0, ∞), and then average over the jump times of (J t ) t≥0 while keeping the jumps of X fixed. In this way we reduce the problem to the one for the discrete-time random walk treated in the proof of Theorem 1.6. For the first expression in (3.37) this partial annealing gives an upper bound, while for the second expression it is simply part of the averaging over S.
Define which can be viewed as a result of "partial annealing", and so it suffices to show that r 1 < r 2 .
To this end write out (3.42) Integrating over 0 ≤ t 1 ≤ · · · ≤ t N < ∞, we obtain we may rewrite (3.43) as This expression is similar in form as the first line of (3.8), except that the order of the j i 's is not strict. However, defining with the convention F and recalling (3.40), we therefore have the relation and so it suffices to compute r 1 . Write Equations (3.50-3.51) replace (3.8-3.9). We can now repeat the same argument as in (3.15-3.21), with the sole difference that f in (3.9) is replaced by f in (3.51), and this, combined with Lemma 3.3 below, yields the gap r 1 < r 2 .
We first check that f is bounded from above, which is necessary for the application of Varadhan's lemma. To that end, we insert the Fourier representation (3.13) into (3.44) to obtain from which we see that Θ n (u) ≤ Θ n (0), u ∈ Z d . Consequently, From (1.1), (3.44) and (3.54) it follows that Θ n (0)/p 2 n/2 (0) ≤ C < ∞ for all n ∈ N, so that f indeed is bounded from above. Note that X is the discrete-time random walk with transition kernel p(·, ·). The key ingredient behind r 1 < r 2 is the analogue of Lemma 3.2, this time with Q * = (q * ) ⊗N and q * given by replacing (3.18). The proof is deferred to the end.

Proof of Theorem 1.2
This section uses techniques from [6]. The proof of Theorem 1.2 is based on two approximation lemmas, which are stated in Section 4.1. The proof of these lemmas is given in Sections 4.2-4.3.

Two approximation lemmas
Return to the setting in Section 2. For Q ∈ P inv ( E N ), let H(Q) denote the specific entropy of Q.
Write h(· | ·) and h(·) to denote relative entropy, respectively, entropy. Write Let Q ∈ P erg,fin ( E N ) be such that H(Q) < ∞ and G(Q) := e E (π 1 Q)(dy) g(y) ∈ R. Then Let Q ∈ P erg ( E N ) be such that I que (Q) < ∞ and G(Q) ∈ R. Then there exists a sequence (Q n ) n∈N in P erg,fin ( E N ) such that Moreover, if E is countable and ν satisfies then (Q n ) n∈N can be chosen such that H(Q n ) < ∞ for all n ∈ N.
Lemma 4.2 yields the following. Proof. Return to the setting in Section 3.1. In Lemma 4.1, pick g = log f with f as defined in (3.9). Then (1.11) is the same as (4.2), and so it follows that where the condition that the first term under the supremum be finite is redundant because g = log f is bounded from above. Recalling (3.10) and (3.19), we thus see that (4.9) The right-hand side of (4.9) is the same as that of (1.13), except for the restriction that H(Q) < ∞.
To remove this restriction, we use Corollary 4.3. First note that, by (1.12), condition (4.4) in Lemma 4.2 is fulfilled for g = log f . Next note that, by (1.10) and Remark 4.4 below, condition (4.6) in Lemma 4.2 is fulfilled for ν = p. Therefore Corollary 4.3 implies that r 1 equals the righthand side of (1. 13), and that the suprema in (1.13) and (1.6) agree.

Proof of Lemma 4.1
Proof. The idea is to make the first word so long that it ends in front of the first region in X that looks like the concatenation of N words drawn from Q, and after that cut N "Q-typical" words from this region. Condition (4.2) ensures that the contribution of the first word to the left-hand side of (4.3) is negligible on the exponential scale.
To formalize this idea, we borrow some techniques from [6], Section 3.1. Let H(Ψ Q ) denote the specific entropy of Ψ Q (defined in (2.8)), and H τ |κ (Q) the "conditional specific entropy of word lengths under the law Q given the concatenation" (defined in [6], Lemma 1.7). We need the relation First, we note that H(Q) < ∞ and m Q < ∞ imply that H(Ψ Q ) < ∞ and H τ |κ (Q) < ∞ (see [6], Lemma 1.7). Next, we fix ε > 0. Following the arguments in [6], Section 3.1, we see that for all N large enough we can find a finite set A = A (Q, ε, N ) ⊂ E N of "Q-typical sentences" such that, for all z = (y (1) , . . . , y (N ) ) ∈ A , the following hold: where we abbreviate   Indeed, for each N , coarse-grain X into blocks of length L N := N (m Q + ε) . For i ∈ N ∪ {0}, let A N,i be the event that θ iL N X begins with an element of B. Then, for any δ > 0, and hence which is summable in N . Thus, lim sup N →∞ 1 N log τ N ≤ χ(Q)+δ by the first Borel-Cantelli lemma. Now let δ ↓ 0, to get (4.17).

Proof of Lemma 4.2
Proof. Without loss of generality we may assume that m Q = ∞, for otherwise Q n ≡ Q satisfies (4.5). The idea is to use a variation on the truncation construction in [6], Section 3. For a given truncation level tr ∈ N, let Q ν tr be the law obtained from Q by replacing all words of length ≥ tr by words of length tr whose letters are drawn independently from ν. Formally, if Y = (Y (i) ) i∈N has law Q and Y = ( Y (i) ) i∈N has law (ν ⊗tr ) ⊗N and is independent of Y , thenȲ = (Ȳ (i) ) i∈N ) defined byȲ has law Q ν tr .
Lemma 4.5. For every Q ∈ P inv,erg ( E N ) such that I que (Q) < ∞ and every tr ∈ N, (4.24) Proof. The intuition is that under Q ν tr all words of length tr have the same content as under q ⊗N ρ,ν , while under [Q] tr they do not. The proof is straightforward but lengthy, and is deferred to Appendix A.
Using (4.24) and noting that m Q ν tr = m [Q]tr < ∞, we obtain (recall (2.13-2.14)) lim sup On the other hand, we have where we use dominated convergence for the first summand and condition (4.4) for the second summand. Combining (4. 25-4.26), we see that we can choose tr = tr(n) such that (4.5) holds for Q n = Q ν tr(n) . It remains to verify that, under condition (4.6), H(Q ν tr ) < ∞ for all tr ∈ N. Since H(Q ν tr ) ≤ h(π 1 Q ν tr ), it suffices to verify that h(π 1 Q ν tr ) < ∞ for all tr ∈ N. To prove the latter, note that (we write L Q ν tr (τ 1 ) to denote the law of τ 1 under Q ν tr , etc.) k |τ 1 = + tr h(ν).
Proof of Theorem 1.5. The claim follows from the representations (1.13-1.14) in Theorem 1.2, and the fact that I que = I ann when α = 1.
6 Examples of random walks satisfying assumptions (1. 10-1.12) In this section we exhibit two classes of random walks for which (1.10-1.12) hold.
1. Let S be an irreducible random walk on Z d with E[ S 1 3 ] < ∞. Then standard cumulant expansion techniques taken from Bhattacharya and Ranga Rao [2] can be used to show that for every C 1 ∈ (0, ∞) there is a C 2 ∈ (0, ∞) such that where Σ is the covariance matrix of S 1 (which is assumed to be non-degenerate), and c is a constant that depends on p(·). The restriction p n (x) > 0 is necessary: e.g. for simple random walk x and n in (6.1) must have the same parity. The Hartman-Wintner law of the iterated logarithm (see e.g. Kallenberg [24], Corollary 14.8), which only requires S 1 to have mean zero and finite variance, says that lim sup n→∞ |(S n ) i | √ 2 Σ ii n log log n = 1 a.s., i = 1, . . . , d, where (S n ) i is the i-th component of S 1 . Using S n ≤ √ d max 1≤i≤d |(S n ) i |, we obtain that there is a C 3 ∈ (0, ∞) such that lim sup n→∞ S n √ n log log n ≤ C 3 S − a.s. (6.3) Combining and (6.1) and (6.3), we find that there is a C 4 ∈ (0, ∞) such that log[ p n (S n )/p 2 n/2 (0) ] ≥ −C 4 S n 2 /n ∀ n ∈ N S − a.s. (6.4) Combining (6.3) and (6.4), we get (1.11).
For the proof of the second inequality in (4.24), i.e., we need some further notation. Let tr ∈ N be a given truncation level, * a new symbol, * ∈ E, where * tr = * · · · * denotes the word in E * consisting of tr times * , and   Since Q(K (N,tr) ) = Q(K (N,tr, * ) , K (N,tr,∼) ) = Q(K (N,tr, * ) )Q(K (N,tr,∼) ) | K (N,tr, * ) ), we see from (A.13-A.14) that The assumption H([Q] tr ) < ∞ guarantees that all the quantities appearing in (A.13-A.15) are proper. Note that H tr,∼| * (Q) can be interpreted as the conditional specific relative entropy of the letters in the "long" words of [Y ] tr given the letters in the "short" words (see Lemma A.2 below). Note that H tr,∼| * (Q) in (A.15) is defined as a "per word" quantity. Since the fraction of long words in [Y ] tr is Q(τ 1 ≥ tr) and each of these words contains tr letters, the corresponding conditional specific relative entropy "per letter" is H tr,∼| * (Q)/[Q(τ 1 ≥ tr) tr], as it appears in (A.22) below.
Step 1. We will first assume that |E| < ∞. Then H([Q] tr ) < ∞ is automatic. Since ν ⊗N is a product measure, we have, for any Ψ ∈ P inv (E N ), where H(Ψ) denotes the specific entropy of Ψ. We have  is the "specific relative entropy of the law of letters in the concatenation of long words given the concatenation of short words in [Q] tr with respect to ν ⊗N ", which is ≥ 0 (see Lemma A.2 below).
Step 2. We extend (A.9) to a general letter space E by using the coarse-graining construction from [6], Section 8. Let A c = {A c,1 , . . . , A c,nc }, c ∈ N, be a sequence of nested finite partitions of E, and let · c : E → E c be the coarse-graining map as defined in [6], Section 8. Since E c is finite and the word length truncation [·] tr and the letter coarse-graining · c commute, we have