Large deviations for random walks on free products of finitely generated groups

We prove existence of the large deviation principle, with a proper convex rate function, for the distribution of the renormalized distance from the origin of a random walk on a free product of finitely generated groups. As a consequence, we derive the same principle for nearest-neighbour random walks on regular trees.


Introduction and main result
The study of random walks on algebraic and geometric structures, most notably graphs and groups, has attracted considerable attention over the last four decades. Initiated by Polya's celebrated results on recurrence and transience of symmetric simple random walks on integer lattices ( [29]), the subject rose to prominence in the sixties, starting with Kesten's foundational work in the context of groups ( [18]). It was later repopularised, mainly owing to pioneering contributions due to Kaimanovich, R. Lyons, Varopoulos, Vershik, to name but a few; several directions of investigation gradually emerged, alongside new connections with various branches of pure and applied mathematics. For further details, we refer the reader to Woess' monograph [36] and the extensive bibliography therein.
In this article, we confine ourselves to the study of random walks on a class of finitely generated groups, and specifically to the investigation of the asymptotic properties of the distribution of the renormalized distance from the origin. Prior to stating our main result, we provide a brief overview of the context within which it can be inscribed.
Let G be a finitely generated group, endowed with the discrete topology, and µ a probability measure on G. The measure µ defines a right random walk (Y n ) n∈N started at Y 0 = e, the identity element of G, given by Y n = X 1 · · · X n for every n ≥ 1, where the X n 's are independent G-valued random variables identically distributed according to µ (see Section 2 for precise definitions). Select a subset S ⊂ G generating the group G. It determines a length function ℓ on G, measuring the size of its elements with respect to S; more precisely, for every g ∈ G, ℓ(g) is the minimal number of elements from the set S ∪ S −1 which are needed to obtain g by multiplying them together. This corresponds to the path distance from the identity on the Cayley graph of G with respect to the generating set S. To simplify the discussion, and in accordance with the cases of utmost interest, we shall always assume that S is finite, though this is not necessary for the validity of Theorem 1.4, which represents the main contribution of the article.
The following well-known result provides an analogue, in a possibly non-commutative setting, of the strong law of large numbers for sums of independent real random variables. Theorem 1.1. Assume that µ has finite first moment with respect to the length function ℓ, that is G ℓ(g) dµ(g) < ∞. Then, there exists a non-negative real number λ such that lim n→∞ 1 n ℓ(Y n ) = λ P-almost surely.
Theorem 1.1 is a consequence of Kingman's subadditive ergodic theorem ( [19]); for a proof, we refer to the original article of Guivarc'h [16].
The constant λ appearing in Theorem 1.1 is called the escape rate (or speed ) of the random walk; it clearly depends on µ and on the length function ℓ.
Once almost-sure convergence of the sequence 1 n ℓ(Y n ) n≥1 is established, it is natural to enquire about the asymptotic behaviour of the deviations from the mean ℓ(Y n ) − nλ. In this spirit, a central limit theorem was first established in [31] for the case of free groups; a second, more geometric proof of the same result was later provided by Ledrappier in [23]. Subsequently, Bjorklund ([5]) transposed Ledrappier's argument to the setting of Gromov-hyperbolic groups (cf. [15,13]), proving a central limit theorem for the Green metric on the group G. The rationale behind the introduction of such a metric is of geometric nature: with respect to the Green metric, the horofunction boundary of G is G-equivariantly homeomorphic to the Gromov boundary, a technical assumption which is instrumental in Bjorklund's approach. Thereafter, Benoist and Quint ([4]) extended the result to distance functions defined by word lengths, by adapting the method introduced in [3]. . Let G be a Gromov-hyperbolic group, and suppose that µ is a non-elementary and non-arithmetic probability measure on G with finite second moment, that is G ℓ(g) 2 dµ(g) < ∞. Then the sequence of renormalized random variables 1 √ n (ℓ(Y n ) − nλ) , n ≥ 1, converges in distribution to a non-degenerate Gaussian law.
For an explanation of the assumptions on the measure µ appearing in Theorem 1.2, we refer the reader to [4]. It is worth noticing that all earlier works on the central limit theorem in this context rely on the stronger assumption of finiteness of some exponential moment for µ. A recent paper by Mathieu and Sisto ([26]), in which Theorem 1.2 is established for the yet broader class of acylindrically hyperbolic groups, also deserves mention.
In light of Theorem 1.1, it is clear that n→∞ −→ 0 for any δ > 0. (1.1) We are interested in the decay rate of the probability of such rare events. Properly speaking, we ask whether the sequence of random variables 1 n ℓ(Y n ) n≥1 satisfies the large deviation principle (see Section 3); loosely, it amounts to asking if there is a well-defined exponential decay rate for the probability of events of the type appearing in (1.1).
It is natural to expect the large deviation principle to hold for a large class of finitely generated groups, in particular for Gromov-hyperbolic groups; we expand slightly more on possible extensions of our approach 1 in this direction in Section 6. The applicability of the same strategy to such extensions, as well as to analogous questions in random matrix products, is already mentioned in [34].
Our main result establishes the existence of the large deviation principle, with a proper convex rate function, for the collection of non-trivial free products of finitely generated groups, under a non-degeneracy assumption on the semigroup Γ generated by the support of the driving measure µ. Specifically, we require that Γ is pattern-avoiding: there exists a positive integer D > 0 such that, for any reduced word ω = y 1 · · · y D of type size D in the free product, there is an element g ∈ Γ \ {e} which neither starts with ω nor ends with ω −1 . For a precise definition, we refer to Section 2.2, while the relevance of this condition to the purposes of the proof is explained in Section 1.1. For the sake of illustration, we hasten to observe that the pattern-avoidance condition is fulfilled, for instance, if Γ intersects two distinct factors of the free product non-trivially (see Example 2.4).
Expanding upon the latter observation, we precede the statement of the main result, Theorem 1.4, with a simpler and more concise version which already singles out a broad class of admissible driving measures. Proposition 1.3. Let r ≥ 2 be an integer, G 1 , . . . , G r non-trivial finitely generated groups, G = G 1 * · · · * G r their free product, S i a finite generating set of G i for i = 1, . . . , r, S = r i=1 S i , ℓ the length function on G determined by S. Let µ be a probability measure on G, and assume its support generates a semigroup Γ with the property that, for any i ∈ {1, . . . , r}, there is an element g ∈ Γ which neither starts nor ends in the factor G i . If (Y n ) n≥0 is a right random walk on G with increments distributed according to µ, then the sequence of random variables 1 n ℓ(Y n ) n≥1 satisfies the weak large deviation principle with a convex rate function. Observe that any semigroup Γ fulfilling the assumptions of Proposition 1.3 avoids patterns of type size D = 1 (the converse clearly fails, as shown in Example 2.4). In order to deal with more general pattern-avoiding semigroups, our method compels us to impose an additional constraint on the size of the factors G 1 , . . . , G r .
The complete formulation of our results reads as follows: Theorem 1.4. Let r ≥ 2 be an integer, G 1 , . . . , G r non-trivial finitely generated groups of subexponential growth, G = G 1 * · · · * G r their free product, S i a finite generating set of G i for i = 1, . . . , r, S = r i=1 S i , ℓ the length function on G determined by S. Suppose that µ is a probability measure on G whose support generates a pattern-avoiding semigroup, and let (Y n ) n≥0 be a right random walk on G with increments distributed according to µ.
(1) The sequence of random variables 1 n ℓ(Y n ) n≥1 satisfies the weak large deviation principle with a convex rate function I : (2) If µ has a finite exponential moment, then I is a proper function and the sequence 1 n ℓ(Y n ) n≥1 satisfies the full large deviation principle with rate function I.
(3) If µ has finite moment-generating function, then I is the Fenchel-Legendre transform of the limiting logarithmic moment generating function of the sequence 1 n ℓ(Y n ) n≥1 . A close inspection of the proof of Lemma 4.2 reveals that the whole argument leading to Theorem 1.4 can be readily adapted to establish Proposition 1.3. In particular, the last two assertions of Theorem 1.4 remain equally valid in the setting of Proposition 1. 3.
For a precise definition of all the terms involved in the statement of Theorem 1.4, we refer the reader to Sections 2 and 3. Let us just recall here that a probability measure µ on G is said 1 After the first version of this paper appeared, Boulanger, Mathieu, Sert and Sisto [6] proved existence of the large deviation principle for random walks on geodesic hyperbolic spaces, thus encompassing the case of walks on Gromov-hyperbolic groups. The underpinning strategy does not differ substantially from our approach, though it relies on deeper geometric considerations.
By taking G i = Z for all i = 1, . . . , r, we settle in particular the question of existence of the large deviation principle for random walks on free groups; in turn, this yields the result for nearest-neighbour random walks on locally finite regular trees (a straightfoward adaptation of the proof of Theorem 1.4 allows to deal with regular trees of odd degree as well). For the sake of simplicity, we state the corollary in the case relevant for applications to (possibly lazy 2 ) simple random walks on trees.
Corollary 1.5. Let G be a free group on r ≥ 1 generators, and let S be a free set of generators. Assume µ is a probability measure on G whose support is contained in S ∪ S −1 ∪ {e}, and let (Y n ) n≥0 be a right random walk on G with increments distributed according to µ. The sequence of random variables 1 n ℓ(Y n ) n≥1 , where ℓ is the length function on G determined by S, satisfies the large deviation principle with a proper, convex rate function, coinciding with the Fenchel-Legendre transform of the limiting logarithmic moment generating function of 1 n ℓ(Y n ) n≥1 . Notice that the case r = 1 of Corollary 1.5 is not covered in principle by Theorem 1.4; on the other hand, this case is a well-known, elementary instance of Cramer's theorem (cf. [10, Thm. 2.2.3]) on deviations of the empirical mean of independent, identically distributed real random variables. Incidentally, our method would be readily applicable to this case as well, as we point out in section 6, thus yielding an indirect proof of Cramer's theorem for simple random walks on Z (and Z d ). (1) A version of Grushko's theorem ( [24]) asserts that every finitely generated group can be decomposed in an essentially unique way as a free product of finitely many groups, which are not further decomposable as non-trivial free products. Notwithstanding this structural result, the class of examples Theorem 1.4 deals with is restricted, because of the limitations imposed on the generating set S, whose peculiar structure is crucial to our approach (cf. Section 1.1 below). On the other hand, the pattern-avoiding assumption on the semigroup Γ is by no means necessary for the result to hold; it is only a convenient manner of identifying a large class of examples to which our method applies 3 . Therefore, it stands to reason to expect that a technical refinement of our method would allow to weaken the assumption on the support of the driving measure, and deal with the case in which no conjugate of the semigroup Γ lies in a single factor. In this respect, see the proof of Lemma 4.2. A similar result would yield, notably, that existence of the LDP for the length function is stable under taking free products.
(2) The result in Corollary 1.5 might also be derived, when 2r = p + 1 for a positive prime p, from the large deviation principle for random walks on linear algebraic groups 4 (see [35,Thm. 3.3]), by choosing an appropriate representation of the free group in the projective special linear group PSL 2 (Q p ) 5 . Our approach is different in that it 2 A G-random walk (Y n ) n is customarily called lazy if µ(e) ≥ 1/2; here, for convenience, we employ the terminology in order to refer to the more general case µ(e) > 0. 3 It becomes clear from the proofs that the very same method takes care, in addition, of some cases such as supp µ ⊂ {(ab) n : n ∈ Z} in G = a, b a free group on two generators, in which the semigroup generated by supp µ is not pattern-avoiding. Ruling out such trivial examples, it doesn't seem unlikely that a failure of the pattern-avoidance condition actually forces a conjugate of Γ to lie in one of the factors. 4 This has been pointed out to the author by C. Sert. 5 The rank-one algebraic group PSL 2 (Q p ) acts by isometries on its Bruhat-Tits tree T , which is regular of degree p + 1 (for the construction, we refer to Serre's book [32]). Hyperbolic elements of PSL 2 (Q p ) act on T as hyperbolic elements in the geometric sense (cf. [30,Sec. 6]). Choosing a base vertex o ∈ T , the translation resorts to the intrinsic geometric properties of the free group, rather than appealing to a representation.
(3) Sharp large deviations estimates for the word-length functional of finite-range random walks on free groups are already present in the work of Lalley 6 ([21, Thm. 7.2]). The techniques adopted there differ significantly from ours, hinging on an extension of the Perron-Frobenius theory of nonnegative matrices to certain inhomogeneous matrix products; they yield finer information on the rate function, notably strict convexity, but require the assumption of aperiodicity of the random walk (cf. [21]), which our method does not necessitate.
Remark 1.7. Our hypothesis on the support of µ is unrelated to the choice of the generating set S. This makes Theorem 1.4 applicable, for instance, to the following circumstance, in which the driving measure has apriori no connection with the generating set. Let G be a finitely generated group, H < G a finite-index subgroup (hence H is finitely generated by Schreier's subgroup lemma), S ⊂ H a finite generating set of H, T ⊂ G a set of representatives of right cosets of H in G,S = {st : s ∈ S, t ∈ T } the corresponding finite generating set of G. Suppose thatμ is a probability measure on G whose support is containedS ∪S −1 ∪ {e}, thus giving rise to a nearest-neighbour random walk (Y n ) n∈N on the Cayley graph Cay(G,S) of G with respect toS. Let τ 1 < τ 2 < · · · τ n < · · · be the strictly increasing sequence of stopping times defined by the successive instants in which the random walk visits H; they are all finite P-almost surely, since H has finite index in G. By an iterative application of the strong Markov property ( [20,Chap. 17]) to the process (Y n ) n∈N , it follows that H-valued process (Y τn ) n∈N (where we agree that Y τ 0 = e) is a right random walk on H driven by a measure µ having finite momentgenerating function with respect to the word length determined by S; if H is a non-trivial free product of finitely generated groups, all conclusions of Theorem 1.4 hold. An example of interest is the arithmetic group SL 2 (Z), which contains a multitude of finiteindex free subgroups (cf. [17, Chap. II]).

1.1.
Outline of the strategy. To illustrate the overarching strategy of our proof of Theorem 1.4, it is informative to recall the indirect approach to the proof of Cramer's theorem for i.i.d. real random variables, put forward by Lanford ([22]). If (X n ) n≥1 is a sequence of i.i.d. Rvalued random variables and S n = n i=1 X i denotes the sequence of partial sums, then, for every x ∈ R and ε > 0, the limit lim n 1 n log P 1 n S n ∈ (x − ε, x + ε) exists in [−∞, 0] by supermultiplicativity of the sequence P 1 n S n ∈ (x − ε, x + ε) , which in turn is given by additivity of the the process (S n ) n≥1 . The weak LDP now follows from a standard result in large deviations' theory (see Proposition 3.4). Similarly, the weak LDP holds for any additive functional 7 ℓ ′ of a random walk (Y n ) n∈N on a group G.
The major obstacle, when attempting to transport this argument to our context, lies in the defect of additivity of length functions on discrete groups; subadditivity only ensures supermultiplicativity of the sequence P( 1 n ℓ(Y n ) ∈ I) for intervals of the form I = (−∞, x). Still, if the random walk can be restricted to subsets in which the length function is almost additive (cf. Lemma 4.2 and the terminology introduced thereunder) without sizeable loss in the exponential decay rate of the corresponding probabilities, then Lanford's approach carries over almost unaffectedly. Specifically, the structure of the generating set S, obtained by concatenating generating sets of the various factors, enables us to quantify neatly the lack distance from o corresponds, up to a multiplicative factor, to the operator norm on PSL 2 (Q p ) derived from a choice of a K-invariant ultrametric norm on the local field Q p , where K < PSL 2 (Q p ) is the compact stabilizer of o. Selecting hyperbolic elements which generate a Zariski-dense free subgroup of PSL 2 (Q p ) amounts to definining an isometric embedding of the corresponding free group in PSL 2 (Q p ). 6 We thank S. Müller for drawing our attention to this reference.
of additivity in terms of the reduced-word expansion of the elements involved; the patternavoiding assumption on the semigroup Γ can then be leveraged to confine the attention to subsets on which the length function is weakly additive, and which are attained by the random walk with sufficiently high probability on an exponential scale. This is detailed in Lemma 4.2. Once a uniform lower bound for the loss of additivity is achieved, it is possible to deduce that, if γ = lim sup n 1 n log P 1 n ℓ(Y n ) ∈ (x − a, x + a) for given x, a ∈ R >0 , then the bound x + a) ≥ e n k (γ−η) (η being an arbitrarily small parameter) holds for a non-lacunary sequence of integers (n k ) k . The arithmetic nature of such a sequence permits to deduce a lower bound lim inf n As a concluding comment, let us point out that the strategy outlined here parallels arguments employed in [35] to deal with large deviations of the Cartan projection of random matrix products; in this context, a weak form of additivity for the Cartan projection is satisfied on (r, ε)-Schottky semigroups, as shown by Benoist ([2]). The restriction of the random walk to such semigroups is then made possible by a result of Abels-Margulis-Soifer ( [1]), establishing the ubiquity of (r, ε)-proximal elements in Zariski-dense semigroups.

1.2.
Outline of the article. We begin with some preliminaries on random walks on finitely generated groups in Section 2, which mainly serve the purpose of fixing notation and elucidating the nature of the pattern-avoiding assumption we impose on the semigroup Γ. In Section 3 we recall some standard terminology from the theory of large deviations, together with a few general facts which are employed in the proof of Theorem 1.4. Sections 4 and 5 are devoted to the proof our main result 1.4; specifically, in Section 4 we establish existence of the large deviation principle, while in Section 5 we prove convexity of the rate function, which, together with properness, allows us to identify it as the convex conjugate of a logarithmic moment generating function. Finally, in Section 6 we assemble ideas on possible generalizations of Theorem 1.4, list some open questions and formulate related conjectures. his gratitude for several insightful comments and enlightening conversations. Special thanks go to the referee for a thorough reading of the article, which tremendously helped improve its quality. Lastly, we would like to thank Manfred Einsiedler for valuable remarks on a preliminary version, as well as Sebastian Müller for providing many useful references and observations.

Random walks on groups
2.1. Word length and metric on a finitely generated group. Convenient sources for the material presented hereunder are [17,25,36].
Let G be a finitely generated group with identity element e, S ⊂ G a finite generating set. Let S −1 = {s −1 : s ∈ S} denote the set of inverses of the elements in S, so that We define the word length ℓ detemined by the generating set S as the function ℓ : G → N given by ℓ(g) = inf{n ∈ N : there exist s 1 , . . . , s n ∈ S ∪ S −1 such that g = s 1 · · · s n } for every g ∈ G, with the understanding that ℓ(e) = 0. Then ℓ is a length function, meaning that it satisfies the following properties: • ℓ(g) ≥ 0 for all g ∈ G and ℓ(g) = 0 if and only if g = e; • ℓ(g −1 ) = ℓ(g) for all g ∈ G; • ℓ(g 1 g 2 ) ≤ ℓ(g 1 ) + ℓ(g 2 ) for all g 1 , g 2 ∈ G.
The word length ℓ determines a distance function d on G, called the word metric associated to the generating set S, defined by d(g 1 , g 2 ) = ℓ(g −1 1 g 2 ) for all g 1 , g 2 ∈ G. The word metric d is invariant for the action of G on itself by left translation, namely d(gg 1 , gg 2 ) = d(g 1 , g 2 ) for all g, g 1 , g 2 ∈ G.
We denote by Cay(G, S) = (V, E) the Cayley graph of G with respect to S; we recall that this is the simple, undirected graph whose vertex set V is the group G, where two vertices g 1 , g 2 ∈ V are connected by an edge e = {g 1 , g 2 } ∈ E if and only if d(g 1 , g 2 ) = 1. In other words, there is an edge connecting g 1 to g 2 if and only if there is s ∈ S ∪ S −1 \ {e} such that g 2 = g 1 s.
The graph Cay(G, S) is connected, transitive and locally finite of degree |S ∪ S −1 \ {e}|. The word metric d on G corresponds, via this identification, to the path distance on the vertex set V (cf. [25,Chap. 3]).
Let B G (T ) = {g ∈ G : ℓ(g) ≤ T } be the closed d-ball of radius T centered at the identity, for any T ∈ R ≥0 . As the sequence |B G (n)| n≥1 is submultiplicative, the limit γ S = lim n |B G (n)| 1/n exists; we say that G has subexponential growth if γ S = 1, a property which is actually independent of the generating set S. Recall that a broad class of finitely generated groups with subexponential (in fact, polynomial) growth consists of nilpotent groups ( [37]).
If G = G 1 * · · · * G r is the free product (cf. [7]) of finitely generated groups G 1 , . . . , G r , we shall always restrict our considerations to the following kind of generating sets (and corresponding word lengths): we fix generating sets S i ⊂ G i for each factor G i of the free product, and take the union S = r i=1 S as generating set for G.

2.2.
Free products and pattern-avoiding subsets. Let r ≥ 2 be an integer, G 1 , . . . , G r non-trivial finitely generated groups, and let G = G 1 * · · · * G r be the free product of the G i 's. We shall identify each G i , 1 ≤ i ≤ r, with its isomorphic copy embedded in G. . For any non-trivial element g ∈ G, there exist uniquely determined non-trivial elements Any product x 1 · · · x m as in Lemma 2.1 is referred to as a reduced word of type size m in the free product; correspondingly, we shall also say that g = x 1 · · · x m is an element of type size m. For any i ∈ {1, . . . , m}, we call the element x i the i-th letter of the reduced word x 1 · · · x m . Remark 2.2. Suppose that we fix a generating set S i ⊂ G i for each factor of the free product, and let ℓ i denote the associated word length on G i . Then, if ℓ is the word length determined by the generating set S = r i=1 S i ⊂ G and if g, x 1 , . . . , x m are as in Lemma 2.1, it holds ℓ(g) = ℓ(x 1 ) + · · · + ℓ(x m ). Observe in particular that, while the word length of an element g ∈ G depends on the choice of the generating sets for the factors, the type size of g does not.
Let ω = y 1 · · · y d be a reduced word of type size d, g ∈ G an element of type size at least 2, with reduced-word decomposition g = x 1 · · · x m . We shall say that g • starts with ω if x 1 · · · x inf{d,⌊m/2⌋} = y 1 · · · y inf{d,⌊m/2⌋} , and • ends with ω if x m−inf{d,⌊m/2⌋}+1 · · · x m = y 1 · · · y inf{d,⌊m/2⌋} , where ⌊a⌋ indicates the integer part of a real number a. Notice that the definition is independent of any choice of generating sets for the factors G 1 , . . . , G r of the free product. Example 2.3. If G = a, b is a free group on two generators a and b, then the element abab starts with ab and ends with ab, while the element abab −1 a −1 starts with ab and ends with b −1 a −1 . Also, according to our definition, the latter element starts with any word abω ′ obtained by juxtaposing a reduced word ω ′ to ab in such a way that abω ′ is again a reduced word.
A subset T ⊂ G is called pattern-avoiding if there exists a positive integer D > 0 such that, for any reduced word ω = y 1 · · · y D of type size D in the free product, there exists g ∈ T such that g does not start with ω and does not end with ω −1 = y −1 D · · · y −1 1 (in particular, g has type size at least 2). In case we need to keep track of the integer D, we shall say that T avoids patterns of type size D. The examples presented below clarify the notion.
(1) Let G = a, b, c be a free group on three generators a, b and c. The sets are pattern-avoiding, while the set is not pattern-avoiding, as all its elements either start with a or end with a −1 .
(2) If S ⊂ gG i g −1 for some i ∈ {1, . . . , r} and some g = x 1 · · · x m ∈ G, then the semigroup Γ generated by S is not pattern-avoiding: all its elements start with x 1 · · · x m and end with ( Then the semigroup Λ generated by S is pattern-avoiding: if x ∈ S ∩ (G i \ {e}) and y ∈ S ∩ (G j \ {e}), then {xy, yx} is pattern-avoiding and contained in Λ. (4) The semigroup generated by {aba, a 2 ba 2 } in G = a, b avoids patterns of type size 1, but does not satisfy the hypotheses of Proposition 1.3: its elements start and end in the factor a .
Obviously, if T ′ ⊂ T ⊂ G and T ′ is pattern-avoiding, then so is T . Conversely, the following elementary observation is essential for our line of reasoning in Section 4: if T is pattern-avoiding, then there exists a finite subset T ′ ⊂ T which is also pattern-avoiding 8 .

2.3.
Random walks on finitely generated groups. Let µ be a probability measure on the group G; equivalently, µ is a function defined on G taking non-negative real values and satisfying g∈G µ(g) = 1. Then µ defines a right random walk on G as follows: let (X n ) n≥1 be a sequence of independent, identically distributed G-valued random variables with common law µ. Implicitly, we consider them to be defined over a probability space (Ω, F , P), which will be fixed hereinafter. We define a G-valued stochastic process (Y n ) n∈N by setting Y 0 = e, Y n = X 1 · · · X n for every integer n ≥ 1. The process (Y n ) n∈N is called a right random walk on G, issued from the origin e with increments distributed according to µ. Equivalently, one may defined the process (Y n ) n∈N as a Markov chain on G issued from e with transition matrix Q = (q(x, y)) x,y∈G given by q(x, y) = µ(x −1 y) for all x, y ∈ G (cf. Let supp µ = {g ∈ G : µ(g) > 0} be the support of the measure µ. If supp µ ⊂ S ∪ S −1 , then the process (Y n ) n∈N can also be interpreted as a nearest-neighbour random walk on the Cayley graph Cay(G, S), where the walker in position x moves to xs with probability µ(s), for all s ∈ S ∪ S −1 , x ∈ G. Notice that we are not excluding the case µ(e) > 0, so that the walker may have positive probability of remaining where it is. Let E[X] denote the expectation of a random variable X : Ω → R with respect to the probability measure P. If µ has finite first moment, the sequence of renormalized averaged lengths is a subadditive real sequence, and as such converges to a limit λ ∈ R ≥0 , called the escape rate or speed of the random walk (Y n ) n∈N . As mentioned in the introduction (Theorem 1.1), P-almost every trajectory (y n ) n≥0 ∈ G N of the random walk actually satisfies 1 n ℓ(y n ) n→∞ −→ λ.
(1) We could equally well consider random walks issued at any initial vertex g 0 ∈ G, by defining Y ′ 0 = g 0 , Y ′ n = g 0 X 1 · · · X n for any n ≥ 1. It is then natural to consider the renormalized distance 1 n d(g 0 , Y ′ n ) which, by invariance of d under left translations, equals precisely 1 n d(e, X 1 · · · X n ) = 1 n ℓ(Y n ). Hence, for the purpose of our considerations, there is no loss of generality in assuming that the random walk starts at the origin.
(2) Similarly, restricting to right random walks does not result in any loss of generality; if Y ′ n = X n · · · X 1 , n ≥ 1, is a left random walk issued from the origin with driving measure µ, then (Y −1 n ) n∈N is a right random walk with driving measure ι * µ, given by ι * µ(g) = µ(g −1 ) for every g ∈ G, and ℓ(Y −1 n ) = ℓ(Y n ) for every n ∈ N.

Large deviation principle
In this section, we briefly review some of the terminology that is usually employed in the theory of large deviations. For a comprehensive introduction to the subject, the reader is referred to [10].
Throughout this section, X denotes a Hausdorff regular topological space, endowed with the Borel σ-algebra B. Let (µ n ) n≥1 be a sequence of Borel probability measures on X, I : X → [0, ∞] a lower semicontinuous function. The effective domain of I is the set D I = {x ∈ X : I(x) < ∞}.
Definition 3.1. We say that the sequence (µ n ) n≥1 satisfies the large deviation principle (or, in abridged form, LDP) with rate function I if, for any Borel measurable set Λ ⊂ X, where Λ • and Λ denote the interior and the closure of Λ, respectively.
We observe in passing that, for a given sequence (µ n ) n≥1 , there is at most one lower semicontinuous function I for which the LDP can hold ([10, Lem. 4.1.4]).
In Definition 3.1, it is obviously equivalent to require that lim inf 2) holds just for all compact sets K ⊂ X, then we say that the sequence (µ n ) n≥1 satisfies the weak large deviation principle (weak LDP) with rate function I. If (Z n ) n≥1 is a sequence of X-valued random variables, and µ n denotes the law of Z n for every n ≥ 1, we shall say that (Z n ) n≥1 satisfies the (weak) LDP if the sequence (µ n ) n≥1 satisfies the (weak) LDP.
Under certain conditions, we may retrieve the full LDP from the existence of the weak LDP. The most common of these conditions involves the notion of exponential tightness. Definition 3.2. We say that a sequence (µ n ) n≥1 of Borel probability measures on X is exponentially tight if, for every α ∈ R ≥0 , there exists a compact set K ⊂ X such that lim sup In other words, the mass is concentrated on compact sets, on an exponential scale. It is intuitively clear that exponential tightness enables to pass from a weak form of the LDP to a strong form, something which we clarify in the following proposition (cf. [10, Lem. 1.2.18]). Proposition 3.3. Let (µ n ) n≥1 be an exponentially tight sequence of Borel probability measures on X. Assume that (µ n ) n≥1 satisfies the weak LDP with rate function I. Then: (1) (µ n ) n≥1 satisfies the LDP with rate function I; (2) I is a proper function.
The following statement establishes a criterion to determine whether the weak LDP holds, without knowing the rate function in advance. It will be the key tool to prove existence of the weak LDP in our context. Assume now that X is a locally convex, Hausdorff topological vector space over R, and let X * denote its topological dual. In case the sequence (µ n ) n≥1 satisfies the LDP on X with a proper, convex rate function I, it is possible to give an alternative expression for the rate function itself, provided that a certain logarithmic moment generating function exists. More precisely, define the logarithmic moment generating function of the measure µ n , for each integer n ≥ 1, as the function Λ n : X * → (−∞, ∞] given by Λ n (ϕ) = log X e ϕ,x dµ n (x) for all ϕ ∈ X * , where ·, · denotes the standard dual pairing between X * and X. The limiting logarithmic moment generating function of the sequence (µ n ) n≥1 is then defined as Given a function f : X → (−∞, ∞], not identically infinite, we define its Fenchel-Legendre transform f * : If g : X * → (−∞, ∞] is a function defined on the dual space, we shall view its Fenchel-Legendre transform g * as a function defined just on X, rather than on the entire bidual X * * . A Theorem 3.5. Let (µ n ) n≥1 be a sequence of Borel probability measures on a locally convex, Hausdorff topological vector space X. Assume the following: (1) the limiting logarithmic moment generating function Λ : X * → (−∞, ∞] of the sequence (µ n ) n≥1 is finite for every ϕ ∈ X * ; (2) the sequence (µ n ) n≥1 satisfies the LDP with a proper, convex rate function I. Then the rate function I is the Fenchel-Legendre transform of Λ, namely Theorem 3.5 reveals the importance of knowing a priori the existence of the LDP with a proper, convex rate function.

Existence of LDP
We now set out to prove our main Theorem 1.4. Specifically, the objective of the present section is twofold: in Proposition 4.3, we address existence of the weak LDP, with a certain rate function, under the pattern-avoiding assumption for the semigroup generated by the support of the driving measure, while in Proposition 4.4 the result is upgraded to the full LDP, under the additional requirement of finiteness of some exponential moment. Convexity of the rate function, and the ensuing identification of it as a Fenchel-Legendre transform, are dealt with in Section 5.
For a start, we briefly recall the setup. Let G 1 , . . . , G r be a finite collection of non-trivial finitely generated groups of subexponential growth, G = G 1 * · · · * G r their free product. For any i ∈ {1, . . . , r}, S i ⊂ G i is a finite set of generators of G i , so that S = r i=1 S i is a finite generating set for G, with associated word length ℓ : G → N. Let µ be a probability measure on G, (Y n ) n≥0 a right random walk on G issued from the identity with steps distributed according to µ. For every integer n ≥ 1, let µ n be the law of the random variable 1 n ℓ(Y n ). Henceforth, we shall denote by B(y, ε) the open interval (y − ε, y + ε) ⊂ R, for any y ∈ R and any ε > 0. Furthermore, for any positive integer k, we let kB(y, ε) = {kz : z ∈ B(y, ε)}.
We precede the statement of Proposition 4.3 by two technical lemmas which, taken together, essentially allow to reduce the problem of establishing LDP in this context to a setup akin to the standard case of i.i.d. real random variables, in which (almost-)additivity of the process can be put to good use.
The first of the two lemmas allows to deduce a lower bound for the asymptotic exponential decay rate of the probabilities µ n (B(x, b)) from a uniform lower bound on a non-lacunary sequence of times. Lemma 4.1. Suppose that there exist a > 0, γ ∈ R, a strictly increasing sequence (n k ) k≥1 of positive integers with lim k→∞ n k+1 /n k = 1, such that µ n k (B(x, a)) ≥ e n k γ for all k ≥ 1. (4.1) Then, for all b > a, lim inf n→∞ 1 n log µ n (B(x, b)) ≥ γ .
Proof. Choose a finite set F ⊂ G such that g∈F µ(g) > 1/2. For any k ≥ 1, set and notice that the upper bound M k ≤ (n k+1 − n k )M 1 holds by subadditivity of ℓ. Now let N ≥ n 1 be arbitrary; there exists a unique k = k(N) ≥ 1 such that n k ≤ N < n k+1 . As b − a > 0, the assumption n k+1 /n k → 1 implies that there exists k 0 ∈ N such that this follows from the double inequality |ℓ(g) − ℓ(h)| ≤ ℓ(gh) ≤ ℓ(g) + ℓ(h), holding for every g, h ∈ G. Now, if k ≥ k 0 and N ∈ {n k , . . . , n k+1 − 1}, we may estimate the last two inequalities being given, respectively, by independence and stationarity of the process (X n ) n≥1 , and by the assumption of the lemma. Taking the logarithm and dividing by N, we obtain Taking the inferior limit as N → ∞ on both sides, and observing that the assumption on (n k ) k implies lim N →∞ n k(N ) /N = 1, we achieve the proof.
The next lemma expresses the possibility of restricting the random walk to subsets on which the length function ℓ is almost additive, without losing consistently on the exponential decay rate of the probabilities involved.
Observe that T / log θ T T →∞ −→ ∞ due to the subexponential growth of G 1 , . . . , G r ; as a consequence, the factor (rθ T ) −2D , quantifying the maximal loss in probability, is negligible on an exponential scale (cf. the proof of Proposition 4.3).
Proof. The proof consists of a repeated application of the union bound for ν, in order to extract various subsets of F with predetermined letters in their reduced-word expression.
To begin with, there exist (i 1 , j 1 ) ∈ {1, . . . , r} 2 and F 1 ⊂ F such that ν(F 1 ) ≥ r −2 ν(F ) and, for any g ∈ F 1 , the first letter of g is in G i 1 and the last one is in G j 1 . If i 1 = j 1 , then ℓ(g 1 · · · g k ) = ℓ(g 1 ) + · · · ℓ(g k ) for any g 1 , . . . , g k ∈ F 1 , so that A = F 1 fulfils the statement. If i 1 = j 1 , we might choose a subset E 1 ⊂ F 1 and elements y 1 , z 1 ∈ G i 1 such that ν(E 1 ) ≥ θ −2 T ν(F 1 ) and, for each g ∈ E 1 , the first letter of g is y 1 and the last one is z 1 . We distinguish three cases.
The only remaining case is ℓ(y 1 ) ≤ L, z 1 = y −1 1 . We then carry out the same procedure, selecting F 2 ⊂ E 1 , (i 2 , j 2 ) ∈ {1, . . . , r} 2 , with ν(F 2 ) ≥ r −2 µ(E 1 ) and so that, for each g ∈ F 2 , the second letter of g is in G i 2 and the second-to-last one is in G j 2 . If i 2 = j 2 , then ℓ(g 1 · · · g k ) ≥ ℓ(g 1 ) + · · · + ℓ(g k ) − k(2L) for any g 1 , . . . , g k ∈ F 2 . If instead i 2 = j 2 , then choose E 2 ⊂ F 2 and elements y 2 , z 2 ∈ G i 2 so that ν(E 2 ) ≥ θ −2 T ν(F 2 ) and, for each g ∈ E 2 , the second letter of g is y 2 and the second-to-last one is z 2 . Notice that, by assumption, T is not contained in any conjugate of any factor G i by any word ω of type size not exceeding D. Therefore, unless ℓ(y 2 ) ≤ L and z 2 = y −1 2 , we can set A = E 2 and conclude as before. Proceeding in this way, we select, if needed at each successive step, nested subsets ; furthermore, there are letters y 3 , . . . , y D , z D such that, for any g ∈ E D , the reduced-word expression of g is y 1 · · · y D · · · z D y −1 D−1 · · · y −1 1 . It remains to deal with three possibilities, as above.
If a set A (resp. A · g) satisfies the conclusion of Lemma 4.2, then we say that A (resp. A · g) has the weak length additivity property of order LD.
We are now in a position to prove existence of the weak LDP.
Proposition 4.3. Let G, S, ℓ, µ be as above, (Y n ) n≥0 a right random walk on G issued from the identity with increments distributed according to µ. Suppose that the support of µ generates a pattern-avoiding semigroup Γ ⊂ G. Then the sequence of R-valued random variables 1 n ℓ(Y n ) n≥1 satisfies the weak LDP with a rate function I : R ≥0 → [0, ∞]. Proof. We rely on the criterion phrased in Proposition 3.4, checking that the condition expressed therein is satisfied. Arguing by contradiction, suppose that there exists x ∈ R ≥0 such that As the left-hand side of (4.2) always dominates the right-hand side by definition, this yields Notice first that, necessarily, x is strictly positive; indeed, for x = 0 the criterion in Proposition 3.4 is trivially satisfied, as lim n 1 n log µ n (B(0, ε)) exists in [−∞, 0] for every ε > 0, by subadditivity of ℓ.
As a consequence of (4.3), there exist δ, η > 0 such that To select an element g of this sort, concatenate any letter y ′ D with ω, in such a way that ωy ′ D is a reduced word; using that T avoids patterns of type size D, pick g ∈ T not starting with ωy ′ D nor ending with (ωy ′ D ) −1 .
We claim that, if j is taken to be sufficiently large, the inequality α ≥ β j − η holds, which is opposite to what is given by (4.5), giving the desired contradiction. The hypothesis on the semigroup Γ ensures the existence of a finite subset T ⊂ Γ \ {e} with the following property: there exists an integer D > 0 such that, for any reduced word ω of type size D in G, we can find g ∈ T not starting in ω and not ending in ω −1 (cf. Section 2.2). For any g ∈ T , choose t(g) ∈ N ≥1 and p(g) ∈ R >0 such that the random walk attains g in t(g) steps with probability p(g), that is P(Y t(g) = g) = p(g). Define L = sup{ℓ(g) : g ∈ T }, p = inf{p(g) : g ∈ T }, t = sup{t(g) : g ∈ T }. Keeping with our earlier notation, let θ T = sup{|B G i (T )| : i = 1, . . . , r} for any T ∈ R ≥0 . Now choose an integer j 0 ≥ 1 so that this exists since T / log θ T T →∞ −→ ∞ by the subexponential-growth assumption on the factors G 1 , . . . , G r . Define F = {g ∈ G : ℓ(g) ∈ n j 0 B(x, ρ)}, so that e β j 0 n j 0 = P(Y n j 0 ∈ F ) by (4.6). Notice also that F does not contain the identity as n j 0 (x − ρ) > 0. Applying Lemma 4.2, with ν being the law of the random variable Y n j 0 , we can manufacture a set A ⊂ F and an element g ∈ T such that -P(Y n j 0 ∈ A) ≥ (rθ n j 0 (x+ρ) ) −2D e β j 0 n j 0 and -either A or A · g has the weak length additivity property of order LD. We distinguish two cases.
• First case: A has the weak length additivity property of order LD. Define the sequence n k = kn j 0 , k ≥ 1. Since For such a choice of ρ ′ , we have that ℓ(g 1 · · · g k ) ∈ n k B(x, ρ ′ ) whenever g 1 , . . . , g k are chosen from A. Therefore, we may estimate, for each k ≥ 1, µ n k (B(x, ρ ′ )) = P(ℓ(Y n k ) ∈ n k B(x, ρ ′ )) ≥ P(X 1 · · · X n j 0 ∈ A, . . . , X n k−1 +1 · · · X n k ∈ A) where the middle inequality is given by independence and stationarity of the process (X n ) n≥1 , while the last one comes from our choice n 1 = n j 0 ≥ 2Dη −1 (log r + log θ n j 0 (x+ρ) ). Lemma 4.1 gives as desired. • Second case: A · g has the weak length additivity property of order LD.
Proposition 4.4. In the setting of Proposition 4.3, assume further that µ has a finite exponential moment. Then the rate function I governing the weak LDP for the sequence 1 n ℓ(Y n ) n≥1 is proper, and the sequence 1 n ℓ(Y n ) n≥1 satisfies the full LDP with rate function I. Proof. As before, we let µ n be the law of the random variable 1 n ℓ(Y n ), for every n ≥ 1. In light of Proposition 3.3, it suffices to show that the sequence (µ n ) n≥1 is exponentially tight. By the assumption, there exists a real number τ > 0 such that C := G exp (τ ℓ(g))dµ(g) < ∞.
Fix M > 0. Then the last upper bound being given by Markov's inequality. Subadditivity of the length function ℓ, together with independence and stationarity of the process (X n ) n≥1 , gives Combining the previous two estimates, taking the logarithm and dividing by n, we obtain which establishes exponential tightness of the sequence (µ n ) n≥1 .

Convexity of the rate function
The chief aim of this section is the proof of convexity of the rate function associated to the LDP for the sequence 1 n ℓ(Y n ) n≥1 . In the last part, we gather some further properties of the rate function, and deduce its characterization expressed in the last sentence of Theorem 1.4. As in the foregoing section, we let µ n denote the law of the random variable 1 n ℓ(Y n ), for n ≥ 1. Recall that, if X is a real vector space, a function f : X → (−∞, +∞] is convex if, for any x 1 , x 2 ∈ X and any λ ∈ [0, 1], the function f is mid-point convex if the previous inequality holds for λ = 1/2, that is if for all x 1 , x 2 ∈ X.
Suppose now X is a topological (real) vector space. By iteration, a mid-point convex function f satisfies the inequality (5.1) for any λ ∈ {k/2 n : n ∈ N, k ∈ {0, . . . , 2 n }}. The latter set being dense in [0, 1], (5.1) can be extended to all λ ∈ [0, 1] by a standard approximation argument, provided that we know that f is lower semicontinuous. To wrap up, a lower semicontinuous, mid-point convex function f : X → (−∞. + ∞] is convex. Proposition 5.1. Let G, S, ℓ, µ, (Y n ) n≥0 be as in Proposition 4.3. Then the rate function I, governing the LDP for the sequence of R-valued random variables 1 n ℓ(Y n ) n≥1 , is convex. The proof bears a lot of resemblance with the proof of Proposition 4.3; for the sake of conciseness, we shall omit a few details.
Proof. As observed in the previous paragraph, it suffices to show that I is mid-point convex, since we already know I that is lower semicontinuous. Again, we argue by contradiction: assume there exist x 1 < x 2 ∈ R such that Recall that we have therefore, (5.2) implies that there exist δ, η > 0 such that lim sup for any ρ 1 , ρ 2 > 0. Notice that this forces in particular x 1 , x 2 ∈ R ≥0 . Choose ρ := ρ 1 = ρ 2 < δ. For a sufficiently large n 0 and every n ≥ n 0 , we claim that there exists φ(n) ∈ {2n, . . . , 2n + t} such that Letting n vary over an arithmetic progression for which the corresponding sequence of φ(n) is strictly increasing, it is clear that we obtain a contradiction to (5.3). It remains to prove the claim just stated. Let T ⊂ Γ \ {e} be a finite set avoiding patterns of type size D, and fix n ≥ n 0 ; let F i = {g ∈ G : ℓ(g) ∈ nB(x i , ρ)}, i = 1, 2. Adapting the proof of Lemma 4.2 appropriately 10 , we deduce that there is an element g ∈ T and subsets A i ⊂ F i such that P(Y n ∈ A i ) ≥ (rθ n(x+ρ i ) ) −D P(Y n ∈ F i ) and -either for any g 1 ∈ A 1 , g 2 ∈ A 2 it holds ℓ(g 1 g 2 ) ≥ ℓ(g 1 ) + ℓ(g 2 ) − 2LD, -or for any g 1 ∈ A 1 , g 2 ∈ A 2 , ℓ(g 1 gg 2 ) ≥ ℓ(g 1 ) + ℓ(g 2 ) − 2LD. In the first case, we get the inequality (5.4) for φ(n) = 2n, by observing that g 1 ∈ A 1 , g 2 ∈ A 2 imply ℓ(g 1 g 2 ) ∈ 2nB((x 1 + x 2 )/2, δ); in the second case, we get it for φ(n) = 2n + t(g). We refer to the proof of Proposition 4.3 for the missing details. 5.1. Further properties of the rate function. We list below some additional properties of the rate function, emphasizing connections with other relevant quantities associated to the random walk, such as the rate of escape and the spectral radius.
(1) Since 1 n ℓ(Y n ) converges to the escape rate λ almost surely, I has a zero at x = λ.
(2) Convexity of the rate function I gives, as an immediate corollary, that its effective domain D I is a convex subset of R ≥0 , hence a (possibly degenerate 11 ) sub-interval of the positive half-line. Standard properties of convex functions defined on sub-intervals of the real line imply that, on the open interval D • I , the rate function I is continuous, admits left and right derivatives at every point, and it is differentiable outside a countable set of points. In particular, continuity on D • I gives that If the measure µ is symmetric, that is µ(g) = µ(g −1 ) for every g ∈ G, this quantity coincides with the spectral radius of the Markov operator associated with the random walk (cf. [25,Chap. 6]). For every δ > 0, we have As a consequence, we deduce that 0 ∈ D I provided that the spectral radius is strictly positive. This occurs, for instance, whenever the semigroup Γ generated by supp µ contains e: if n 0 ∈ N is any integer for which P(Y n 0 = e) > 0, then It is worth mentioning that equality I(0) = − log ρ actually holds 12 , whenever the LDP for the word length functional is verified and the measure µ driving the random walk satisfies inf{µ(g) : g ∈ supp µ} > 0 (see [27,Lem. 2.8]). (4) As far as the least upper bound of D I is concerned, assume that the support of µ is bounded, and let L = sup{ℓ(g) : g ∈ supp µ} < ∞. Then I ≡ ∞ on the open halfline (L, ∞), as subadditivity of ℓ implies ℓ(Y n ) ≤ nL P-almost surely for any n ≥ 1. Therefore, in this case, If no restriction is placed on the size of supp µ, then sup D I may be infinite 13 . 11 In general, the rate function I can be as degenerate as possible: for instance, if G = a, b is a free group on two generators, and µ(a) = p = 1 − µ(b) for some p ∈ [0, 1], then I(1) = 0 and I(x) = ∞ for any x ∈ R ≥0 \ {1}, as ℓ(Y n ) = n P-almost surely for every n. 12 We thank S. Müller for communicating this fact. 13 Consider, once again, G = a, b a free group on two generators, and choose a measure µ with supp µ = a . Then P(ℓ(Y n ) = nk) ≥ (µ(a k )) n for all integers n, k ≥ 1, so that I(k) < ∞ for any k ≥ 1. In this example, we have thus D I = R ≥0 .

5.2.
The rate function as a Fenchel-Legendre transform. It remains to prove the final statement of Theorem 1.4, under the assumption that µ has finite moment-generating function. By virtue of Theorem 3.5, it suffices to prove that the limiting logarithmic moment generating function of the sequence (µ n ) n≥1 , given by is finite everywhere, where we have canonically identified R with its dual space, and the dual pairing with the standard product of real numbers. Fix z ∈ R ≥0 ; then E[e z·ℓ(Y 1 ) ] = G exp (zℓ(g)) dµ(g) < ∞, since all exponential moments of µ are finite. Moreover, for any n, m ≥ 1, we have the first inequality comes from subadditivity of the length function ℓ, whereas the second follows from independence and stationarity of the process (X n ) n≥1 . Therefore, the sequence a n = log E[e z·ℓ(Yn) ] , n ≥ 1, (5.6) is subadditive, that is a n+m ≤ a n + a m for every n, m ≥ 1; Fekete's lemma ([25, Ex. 3.9]) gives If z ∈ R <0 , a similar argument shows that the sequence (5.6) is superadditive, and Λ(z) < ∞ follows all the same.

6.
Concluding remarks and open questions 6.1. Groups with strongly connected finite-state automata. We mention another class of examples to which our method would apply: finitely generated groups whose cone type automaton with respect to a given generating set is finite and strongly connected. Let G be a finitely generated group, S a finite set of generators, ℓ the word length defined by S on G. For every element g ∈ G, we define the cone type of g as the set Notice that the usual definition of cone type which appears in the literature ( [9,12,28]) involves geodesic words in the alphabet S, rather that actual group elements of G; our definition is more convenient for the purposes of this discussion.
The cone type of an element selects those geodesic segments that can be attached (in algebraic terms, multiplied) to it on the right so that the concatenation is again a geodesic segment. Observe that it is precisely this notion that, implicitly, comes into play both in the proof of existence of LDP and in the proof of convexity of the rate function.
Cone types offer an algorithmic way to label geodesics in the group G, in other words to identify those strings (s 1 , . . . , s n ) of letters in the alphabet S such that ℓ(s 1 · · · s n ) = n. This is achieved through the construction of a finite state automaton (cf. [12]), called the cone type automaton of G with respect to the language given by S. Assume there are only finitely many cone types C 0 = C(e), C 1 , . . . , C s , which we view as vertices of a directed graph ∆ whose edges are labelled by elements of S; more precisely, we connect the cone type C(g) of an element g to the cone type of C(gs), via a directed edge labelled by s ∈ S, if and only if s ∈ C(g). It is immediate that the definition doesn't depend on the choice of g but only on its cone type. If e / ∈ S, there is a one-to-one correspondence between edge-paths in the directed graph ∆ starting at C 0 and finite sequences (s 1 , . . . , s n ) ∈ S n such that ℓ(s 1 · · · s n ) = n, that is geodesic words in the alphabet S. Now, the conditions we need to impose in order for the arguments of Sections 4 and 5 to carry over unaffectedly are: (1) the finite directed graph ∆ is strongly connected, meaning that there is a directed path joining any two of its vertices; (2) every element of G belongs to the cone type of some non-trivial element; otherwise stated, for any geodesic word ω = (s 1 , . . . , s n ) in the alphabet S, there is a cone type C = C 0 from which we can follow a directed path in the graph ∆ according to the labelling given by ω.
Example 6.1 (Simple random walks on integer lattices). Consider G = Z d with its standard symmetric set of generators S = {±e i : 1 ≤ i ≤ d}. Any probability distribution µ with supp µ ⊂ S gives rise to a simple random walk (Y n ) n∈N on Z d . It is clear that there are exactly 2 d +2d+1 different cone types (the 2 d quadrants, the 2d half-spaces delimited by the d coordinate planes, and the whole Z d ). It takes a moment to realize that both conditions stated above are met. We thus recover, by elementary means, existence of the LDP with convex rate function for the process 1 n Y n 1 (where (x 1 , . . . , x d ) 1 = |x 1 | + · · · |x d | for any (x 1 , . . . , x d ) ∈ R d ), which is usually seen as a straightforward consequence of Cramer's theorem for the empirical mean of i.i.d. random vectors (see [10,Thm. 2.2.30]).
Finiteness of the number of cone types appears to be an intrinsic requirement when attempting to establish the LDP via the strategy presented here, while the two additional conditions on the cone type automaton mentioned above can be presumably lifted through a refinement of the method.
A large class of finitely generated groups having only finitely many cone types, with respect to any finite generating set, is given by Gromov-hyperbolic groups; indeed, in such groups the cone type of an element only depends on its k-tail, for a fixed positive integer k depending only on the group (see [9]). Our considerations thus provide substance to the claim that Theorem 1.4 holds for any Gromov-hyperbolic group 14 .
6.2. Some open problems. Computing the exact expression of the rate function, in the cases treated by Theorem 1.4, is mostly out of reach; however, it is worth carrying through the computation in the easiest case of symmetric simple random walks on free groups, to get a flavour of what should happen in more general circumstances. This has already been performed in [33]: let G be a free group on r ≥ 1 generators, S = {a 1 , . . . , a r } a free generating set, µ the uniform probability measure on S ∪ S −1 , i.e. µ(a i ) = µ(a −1 i ) = (2r) −1 for any i ∈ {1, . . . , r}. The rate function governing the LDP for the sequence 1 n ℓ(Y n ) n≥1 is given by the following expression: where we agree that 0 log 0 = 0. The function I is analytic in (0, 1) and strictly convex in its effective domain [0, 1], and hence admits a unique zero at λ = 1 − 1/r, corresponding to the escape rate of the random walk; as a consequence thereof, the probability P | 1 n ℓ(Y n ) − λ| ≥ ε that the renormalized length deviates largely from the escape rate decays exponentially fast 14 (Added in revision) Gouëzel has shown ([14, Lem. 2.4]) that a non-elementary hyperbolic group G equipped with a word length ℓ satisfies the following geometric property: there exist constants c, C > 0 such that, for any x, y ∈ G, there is an element a ∈ G of length at most C such that ℓ(xay) ≥ ℓ(x) + ℓ(y) − c. The result has been subsequently extended in [11,Lem. 5.3] to relatively hyperbolic groups. It can be used as a replacement of almost length additivity throughout the proof of Theorem 1.4, thereby proving its validity for irreducible random walks on any relatively hyperbolic group, with respect to any word length. The resulting argument simplifies the proof of [6, Thm. 1.2], which however addresses more general spaces and walks, and yields a finer result on the rate function.
with n for any ε > 0. Furthermore, the value of I at 0 is equal (in absolute value) to the logarithm of the spectral radius, as expected. Lastly, we notice that the right derivative I ′ (0) at 0 is finite, while the left derivative I ′ (1) at 1 is infinite.
This motivates the following questions: (1) Is the rate function I in Theorem 1.4 always strictly convex? In particular, does it always have a unique zero at x = λ? (2) What are the finer regularity properties of the rate function? What is the behaviour of the (one-sided) derivatives of I at the extreme points of its effective domain?
Assuming the validity of Theorem 1.4 for Gromov-hyperbolic groups, the same questions can obviously be phrased in this broader context as well.