Large deviation principles for words drawn from correlated letter sequences

When an i.i.d.\ sequence of letters is cut into words according to i.i.d.\ renewal times, an i.i.d.\ sequence of words is obtained. In the \emph{annealed} LDP (large deviation principle) for the empirical process of words, the rate function is the specific relative entropy of the observed law of words w.r.t.\ the reference law of words. In Birkner, Greven and den Hollander \cite{BGdH10} the \emph{quenched} LDP (= conditional on a typical letter sequence) was derived for the case where the renewal times have an \emph{algebraic} tail. The rate function turned out to be a sum of two terms, one being the annealed rate function, the other being proportional to the specific relative entropy of the observed law of letters w.r.t.\ the reference law of letters, obtained by concatenating the words and randomising the location of the origin. The proportionality constant equals the tail exponent of the renewal process. The purpose of the present paper is to extend both LDP's to letter sequences that are not i.i.d. It is shown that both LDP's carry over when the letter sequence satisfies a mixing condition called \emph{summable variation}. The rate functions are again given by specific relative entropies w.r.t.\ the reference law of words, respectively, letters. But since neither of these reference laws is i.i.d., several approximation arguments are needed to obtain the extension.


Introduction and main results
1.1.Notation.Let E be a finite set of letters and E = ∪ ℓ∈N E ℓ the set of finite words drawn from E. Both E and E are Polish spaces under the discrete topology.Write E Z and E Z for the sets of two-sided sequences of letters and words, endowed with the product topology, and let θ and θ denote the left-shifts acting on these sets, respectively.The set of probability laws on E Z and E Z that are shift-invariant, respectively, shift-invariant and ergodic w.r.t.θ and θ are denoted by P inv (E Z ) and P inv ( E Z ), respectively, P inv,erg (E Z ) and P inv,erg ( E Z ), and are endowed with the topology of weak convergence.
Let X = (X k ) k∈Z be a two-sided random sequence of letters sampled according to a shiftinvariant probability distribution ν on E Z .Let τ = (τ i ) i∈Z be a two-sided i.i.d.sequence of renewal times drawn from a common probability law ̺ on N, independent of X.The latter form a renewal process T = (T i ) i∈Z given by Let Y = (Y i ) i∈Z be the two-sided random sequence of words cut out from X according to τ , i.e., The joint law of X and τ is denoted by P. Write |Y i | to denote the length of word i.
The reverse of cutting is glueing.The concatenation operator κ : E Z → E Z glues a word sequence into a letter sequence.In particular, κ(Y ) = X.
i.e., the law of κ(Y ) when Y is drawn from Q, turned into a stationary law by randomizing the location of the origin.
For n ∈ N, let (Y (0,n] ) per ∈ E Z denote the n-periodized version of Y .We are interested in the empirical distribution of words both under P (= annealed law) and under P(• | X) for ν-a.a.X (= quenched law).
1.2.Large deviation principles.If ν is i.i.d., then P is i.i.d. and the annealed LDP is standard, with the rate function given by the specific relative entropy of the observed law of words w.r.t.P. The quenched LDP, however, is not standard.The quenched LDP was obtained in Birkner [2] for the case where ̺ has an exponentially bounded tail, and in Birkner, Greven and den Hollander [3] for the case where ̺ has a polynomially decaying tail: (No condition on the support of ̺ is needed other than that it is infinite.)In the latter case, the quenched rate function turns out to be a sum of two terms, one being the annealed rate function, the other being proportional to the specific relative entropy of the observed law of letters w.r.t.ν, obtained by concatenating the words and randomising the location of the origin.The proportionality constant equals α − 1 times the average word length.
The goal of the present paper is to extend both LDP's to the situation where ν is no longer i.i.d., but satisfies a mixing condition called summable variation, which will be defined in Section 3. In what follows, H(• | •) denotes specific relative entropy (see Dembo and Zeitouni [4], Section 6.5 for the definition and key properties).
Theorem 1.1 (Annealed LDP).If ν has summable variation, then the family of probability laws P(R n ∈ • ), n ∈ N, satisfies the LDP on P inv ( E Z ) with rate n and with rate function given by the specific relative entropy (1.6) I ann is lower semi-continuous, has compact level sets, is affine, and has a unique zero at Q = P.
Theorem 1.2 (Quenched LDP).If ν has summable variation, then for ν-a.a.X the family of conditional probability laws P(R n ∈ • | X), n ∈ N, satisfies the LDP on P inv ( E Z ) with rate n and with rate function given by the sum of specific relative entropies I que is lower semi-continuous, has compact level sets, is affine, and has a unique zero at Q = P.
Theorem 1.3.Both LDPs remain valid when E is a Polish space.
Remark: If m Q = ∞, then the second term in (1.7) is defined to be α − 1 times the truncation limit , where tr is the operator that truncates all the words to length ≤ tr.Moreover, for all See Birkner, Greven and den Hollander [3] for details.
Remark: Both rate functions are the same as for the i.i.d.case, even though the reference laws P and ν are no longer i.i.d.This lack of independence will require us to go through several approximation arguments.Both LDP's can be applied to the problem of pinning of a polymer chain at an interface carrying correlated disorder.This application, which is our main motivation for extending the LDP's, will be discussed in a future paper.
1.3.Outline.In Section 2 we collect some basic facts, introduce the relevant mixing coefficients, and define summable variation.We give examples where this mixing condition holds, respectively, fails.In Section 3 we prove the annealed LDP by applying a result from Orey and Pelikan [14].In Section 4 we prove the quenched LDP by going over the proof in Birkner, Greven and den Hollander [3] for i.i.d.letter sequences and checking which parts have to be adapted.In Section 5 we extend the LDP's from finite E to Polish E by using the Dawson-Gärtner projective limit LDP.
2. Basic facts, mixing coefficients and summable variation 2.1.Basic facts.Throughout the paper we abbreviate The associated sigma-algebra's are written as Since X is no longer i.i.d., the distribution of a word in Y depends on the outcome of all the previous words.However, since the word lengths are still i.i.d., when we condition on the past of the word sequence only the past of the letter sequence is relevant.This allows us to obtain a regular version of the conditional probabilities of P as follows.
Lemma 2.1.The collection (P y − (•), y − ∈ E −N 0 ) of probability laws on E N defined by constitute a regular version of the conditional probability Proof.For every y − ∈ E −N 0 , P y − (•) defined in (2.4) is a probability measure.We must show that By the monotone class theorem, it is enough to prove the claim for finite cylinder sets.Fix r ∈ N, (y i ) 1≤i≤r ∈ E r and pick A = 1≤i≤r {Y i = y i }.Then where κ(A) is the concatenation of A.
which proves the claim.
2.2.Mixing coefficients.We need the following mixing coefficients for letters and words: (2.9) The restrictions ν x − (A) > 0 and P ŷ− (A) > 0 are put in to avoid ∞ − ∞.Nonetheless, (2.8) and (2.9) may be infinite.Note that if Λ 1 = ∅, then the supremum in Definition 2.2(a) is taken over all x − , x− ∈ E −N 0 without any restriction ((x − ) Λ denotes the restriction of x − to Λ).We will use the following abbreviations: Proof.Using Definition 2.2(a), we have (2.12) where Proof.We show that, for all m ∈ N 0 and k, ℓ ∈ N, which yields the claim via iteration.To prove (2.14), pick , and consider the events where x− x (0,k] is the concatenation of x− and x (0,k] .Insert this estimate into (2.2) and take the supremum over x (0,k+ℓ] and x − , x− to get (2.14).
Note that k → ϕ(k) is non-increasing on N 0 .

Summable variation.
The key mixing condition in our LDP's is summable variation: The term summable variation is borrowed from the theory of Gibbs measures, where logarithms of probabilities play the role of interaction potentials, and coefficients similar to our ϕ(n)'s are used to measure the absolute summability of these interaction potentials.
(I) Random processes (with finite alphabet) that satisfy (SV) include i.i.d.processes (ϕ(n) = 0 for all n ∈ N 0 ), Markov chains of order m (ϕ(0) < ∞ and ϕ(n) = 0 for all n ≥ m), and chains with complete connections whose one-letter forward conditional probabilities have summable variation.Ledrappier [12,Example 2,Proposition 4] shows that such chains have a unique invariant measure and are Weak Bernoulli under (SV).Berbee [1,Theorem 1.1] shows that they have a unique invariant measure and are Bernoulli when n∈N exp[− n m=1 ϕ(m)] = ∞, a condition slightly weaker than (SV).(Uniqueness of the invariant measure has been proved more recently by Johansson and Öberg [10] and by Johansson, Öberg and Pollicott [11] under the even weaker condition n∈N ϕ(n) 2 < ∞.) Yet other examples satisfying (SV) include Ising spins labeled by Z with a ferromagnetic pair potential that has a sufficiently thin tail (see Berbee [1]).
(IIa) A class of random processes that fail to satisfy (SV) is the following.Let E = {0, 1}, and let p be any probability law on N such that p(ℓ) ∼ Cℓ −γ for some γ > 2. Since ℓ∈N ℓp(ℓ) < ∞, there exists a stationary renewal process (A k ) k∈Z on N 0 with the following transition probabilities: , n ∈ N 0 . (2.18) The process (X k ) k∈Z defined by X k = 1 {A k =0} fails to satisfy (SV).Indeed, pick n ∈ N and x, x[n] ∈ E −N 0 be such that

Annealed LDP
The annealed LDP in Theorem 1.1 is a process-level LDP.Such LDP's were proven by Donsker and Varadhan [6,7] for reference processes that are Markov or Gaussian.Orey [13] and Orey and Pelikan [14] gave a proof for ratio-mixing processes (see below), using the observation that any random process can be viewed as a Markov process by keeping track of its past.(CD) For all measurable continuous functions f : ) is continuous.
Then the family of probability laws P(R n ∈ •), n ∈ N, satisfies the LDP on P inv ( E Z ) with rate n and with rate function given by the specific relative entropy where Q y − | 1 and P y − | 1 are the one-word marginals of Q y − and P y − (i.e., of Q and P conditional on y − ).
The specific relative entropy H(Q | P) is defined to be infinite when Q y − | 1 ≪ P y − | 1 fails on a set of y − 's with a strictly positive Q-measure.An alternative form of (3.2) is where h( • | • ) denotes relative entropy.The latter can be viewed as the specific relative entropy of the laws of two Markov processes, namely, the laws of the past processes We are now ready to prove Theorem 1.1.

Quenched LDP
In Sections 4.1-4.3we prove several lemmas that are needed in Section 4.4 to give the proof of Theorem 1.2.This proof is an extension of the proof in [3] for i.i.d.ν.We focus on those ingredients where the lack of independence of ν requires modifications.

.3)
Proof.To prove (4.2), pick k ∈ N and A ∈ F (0,k) .If ν x− (A) = 0 then ν x − (A) = 0 as well because ϕ(k) < ∞ and there is nothing to prove, so we can assume ν x− (A) > 0.Then, by the definition of ϕ(k) and Lemma 2.4, To prove (4.3), write , and the inequality uses (4.2).The reverse inequality is obtained in a similar manner.Lemma 4.2.Let m ∈ N, and let (i 1 , . . ., i m ), (j 1 , . . ., j m ) be two collections of integers satis- Proof.We give the proof for m = 2.The general case can be handled by induction.Let ≤ C(ϕ) where the inequality uses (4.3) in Lemma 4.1.Averaging x − w.r.t.ν, we get (4.9) If ν satisfies condition (SV), then ν-a.s., Proof.The strategy of proof consists in writing the sum in (4.10) as an additive functional of an ergodic process and to use Birkhoff's ergodic theorem.First note that the sequence of times (σ n ) n∈Z cuts a sequence of blocks B = (B n ) n∈Z out of the letter sequence X given by Each of these blocks belongs to the following subset of words: Define the process . This process is Markovian and its transition kernel is given by where xy is the concatenation of x and y.For the collection (P ⋆ A (•|x), x ∈ E −N 0 ) to be a proper transition kernel, σ 1 must be ν x -a.s.finite for all x ∈ E −N 0 .Since ν(A) > 0, we know from the Recurrence Theorem in Halmos [8] that σ 1 is ν-a.s.finite.But since ν and (ν x ) x∈E −N 0 are equivalent under condition (SV) (note that C(ϕ) −1 ν(•) ≤ ν x (•) ≤ C(ϕ)ν(•) as a consequence of (4.2) in Lemma 4.1), σ 1 indeed is ν x -a.s.finite for all x ∈ E −N 0 .Since (with a slight abuse of notation) the B ⋆ n 's are also in where π is defined by π We next apply Birkhoff's ergodic theorem to the sum in the right-hand side, i.e., to the process B ⋆ .This process has a stationary distribution, which we denote by P ⋆ A .It is easy to check that P ⋆ A is the law of X (−∞,σ 0 ] conditional on the event ∩ ℓ∈−N 0 {σ ℓ > −∞}, which has probability one according to the Recurrence Theorem.Again using (4.2) in Lemma 4.1, we see that for all sets A and B that are measurable w.r.t.σ(B (−∞,0] ) and σ(B (0,∞) ), respectively, where P A is the law of B induced by P ⋆ A .Therefore P A is Weak Bernoulli (Ledrappier [12]), and hence is ergodic.Thus, we have Moreover, for all x− ∈ E −N 0 , which gives ) and completes the proof.
4.3.Decomposition of relative entropy.Write H(Q) to denote the specific entropy of Q.
Lemma 4.4.Suppose that ϕ(0) < ∞.Then, for all Q ∈ P inv,fin ( E Z ), Proof.To get the first relation, write where we use the abbreviation The second relation follows in a similar manner.
Lemma 4.5.If ν satisfies condition (SV), then for all Q ∈ P inv,erg,fin ( E N ), Proof.First observe that (4.3) in Lemma 4.1 gives Use (4.24) and the ergodicity of Q to obtain, for Q-a.s.Y , Proof.The proof is an extension of the proof in [3] for i.i.d.ν.Since the latter is rather long, it is not possible to repeat all the ingredients here.Below we restrict ourselves to indicating the necessary modifications, which are based on the results in Sections 4.1-4.3.We leave it to the reader to go over the full proof in [3] and check that, indeed, these are the only modifications needed.
Decomposition of relative entropies.Replace [3,] by the relations in Lemma 4.4.These relations allow us to decompose I que as a sum of three terms that appear in the proofs of the lower bound and the upper bound of the LDP as given in [3,Section 1.3].
Upper bound.The upper bound in [3,Proposition 3.1] is proved by first restricting to Q ∈ P inv,erg,fin ( E Z ).The event in [3,Eq. (3.4)] is used to define a suitable neighbourhood of Q.In that equation only the fourth line has to be replaced by Lower Bound.The lower bound in [3, Proposition 4.1] is proved by bounding from below the probability that R n lies in a neighbourhood of some Q ∈ P inv,fin ( E Z ).When Q is ergodic we can use the same strategy as in [3] (namely, by jumping to Q-typical substrings of letters), but a modification is needed to go from [3, Eq.(4.

Extension to Polish spaces
In this section we prove Theorem 1.3, i.e., we extend the LDP's in Theorems 1.1-1.2from a finite letter space to a Polish letter space.We first prove the LDP's for a sequence of coarsegrained finite letter spaces associated with a sequence of nested finite partitions of the Polish letter space.After that we apply the Dawson-Gärtner projective limit LDP (see Dembo and Zeitouni [4], Lemma 4.6.1).A somewhat delicate point is that (SV) for the full process does not necessarily imply (SV) for the coarse-grained process.Indeed, the first supremum in (2.8) decreases under coarse-graining while the second supremum increases.The way out is to use (SV) for the full process to prove the decoupling inequalities in Section 4.1 for the coarsegrained process.
Let X = (X k ) k∈Z be a stationary process on a Polish space (E, d), with (ν provided that the events on which we condition have positive probability. Proof.Note that (5.10) follows by applying (5.9) twice, while (5.11) follows by integrating x − w.r.t.ν in (5.9).Therefore it suffices to prove (5.9).To that end write (5.12) The integral in the numerator equals The following lemma is another consequence of (SV).and the supremum is also a limit.The same result holds when (P, Q) is replaced by (P (c) , Q (c) ) or (ν, Ψ Q ) (ν (c) , Ψ Q ).
Proof.We prove the result for (P, Q).The other cases are similar.For n ∈ N, let B( E n ) be the set of bounded measurable functions on E n .From the variational characterization of relative entropy (see Dembo and Zeitouni [4, Lemma 6.In what follows we need the notion of conditional local absolute continuity (which is weaker than absolute continuity).Definition 5.4.Let F be a finite space equipped with the discrete topology and discrete σalgebra, and let λ, µ be two stationary probability measures on F Z with respective regular conditional probabilities (λ x − , x − ∈ F −N 0 ) and (µ x − , x − ∈ F −N 0 ).The law λ is said to be conditionally locally absolutely continuous w.r.t. to the law µ (written as λ ≪ cond µ) when, for λ-a.a.
when Y is distributed according to Q, respectively, P .The regular conditional probability laws (P y − (Y 1 ∈ • ), y − ∈ E −N 0 ) play the role of transition probabilities for Y * , and regularity translates into the Feller property.
x − and all n ∈ N, λ x − | n is absolutely continuous w.r.t. to µ x − | n (written as λ x − | n ≪ µ x − | n ), where λ x − | n and µ x − | n are the marginal laws on the first n coordinates.
Since this lower bound holds for all n ∈ N, we conclude by letting n → ∞ that ϕ(1) = ∞.(IIb)Anotherclass of random processes that fail to satisfy (SV) is random walk in random scenery.Let S = (S n ) n∈Z be a simple random walk on Z d , d ≥ 1, i.e., S 0 = 0 and S n − S n−1 = X n with (X n ) n∈Z i.i.d.random variables uniformly distributed on {e ∈ Z d : e = 1}.Let ξ = (ξ(x)) x∈Z d be i.i.d.random variables taking the values 0 and 1 with probability12 each, and defineZ n = (X n , ξ(S n )).Then Z = (Z n ) n∈Z isstationary and ergodic, but not i.i.d.In den Hollander and Steif [9, Theorems 2.4 and 2.5] it is shown that Z is Weak Bernoulli if and only if d ≥ 5. Since (SV) implies Weak Bernoulli (Ledrappier [12, Proposition 4]), Z does not satisfy (SV) when 1 ≤ d ≤ 4.
.8) 4.2.Successive occurrences of patterns.Lemma 4.3.Fix m ∈ N and let A ∈ F (0,m] be such that ν(A) > 0. Let (σ n ) n∈Z be defined by [3,the indicator defined in[3, Eqs.(3.36-3.37)],and a k ∈ {0, 1} labels whether or not at some specific location of the letter sequence X there is a string of letters arising from the concatenation of Q-typical words (see [3, Eq (3.5-3.6)]).The inequality in (4.27) is proved via Lemma 4.2 and allows us to use [3, Lemma 2.1], which controls the occurrence of certain patterns in X.We are then able to complete the argument in[3, Section 3.4].A further step consists in removing the ergodicity assumption on Q.The argument in [3, Section 3.5] is long and technical, but carries over essentially verbatim because Lemmas 4.1-4.2allow us, for arbitrary cylinders events, to replace ν by the product of its one-letter marginals at the expense of a finite factor.