A Central Limit Theorem for the Length of the Longest Common Subsequences in Random Words

Let $(X_i)_{i \geq 1}$ and $(Y_i)_{i\geq1}$ be two independent sequences of independent identically distributed random variables taking their values in a common finite alphabet and having the same law. Let $LC_n$ be the length of the longest common subsequences of the two random words $X_1\cdots X_n$ and $Y_1\cdots Y_n$. Under a lower bound assumption on the order of its variance, $LC_n$ is shown to satisfy a central limit theorem. This is in contrast to the limiting distribution of the length of the longest common subsequences in two independent uniform random permutations of $\{1, \dots, n\}$, which is shown to be the Tracy-Widom distribution.


Introduction
We explore here the asymptotic behavior, in law, of the length of the longest common subsequences of two random words. Although the study of this length is decade-old, and extensive from an algorithmic point of view, in various disciplines such as, computer science, bioinformatics, or statistical physics, its mathematically rigorous results are rather sparse. Below, we obtain the first result on the limiting law of this length, when properly centered and scaled.
To begin with, let us present our framework. Throughout, let X = (X i ) i≥1 and Y = (Y i ) i≥1 be two infinite sequences whose coordinates take their values in A m = {α 1 , α 2 , . . . , α m }, a finite alphabet of size m. Next, let LC n be the length of the longest common subsequences (LCSs) of the random words X 1 · · · X n and Y 1 · · · Y n , i.e., LC n is the maximal integer k ∈ {1, . . . , n}, such that there exist 1 ≤ i 1 < · · · < i k ≤ n and 1 ≤ j 1 < · · · < j k ≤ n, for which: for all s = 1, 2, . . . , k.
As well known, LC n is a measure of the similarity/dissimilarity of the two words/strings which is often used in pattern matching, e.g., in computer science the edit (or Levenshtein) distance is the minimal number of indels (insertions/deletions) to transform one string into the other and is therefore given by 2(n − LC n ). (The reader will find in [9], [32], [36] and [40] numerous examples of the relevance of longest common subsequences in various applications.) The asymptotic study of LC n began with the well known result of Chvátal and Sankoff [13] asserting, via a superadditivity argument, that lim n→∞ ELC n n = γ * m , (1.1) whenever, for example, (X i ) i≥1 and (Y i ) i≥1 are two independent sequences of independent identically distributed (iid) random variables having the same law. However, to this day, the exact value of γ * m = sup n≥1 ELC n /n (which depends on the distribution of X 1 and on the size of the alphabet) is unknown, even in "simple cases", such as for uniform Bernoulli random variables. Nevertheless, its asymptotic behavior, as the alphabet size grows, is known (see Kiwi,Loebl and Matousek ([26])) and given, for X 1 uniformly distributed, by: lim m→∞ √ mγ * m = 2. (1.2) Chvátal and Sankoff's law of large numbers was further sharpened by Alexander ([2]) who proved that where K A > 0 is a universal constant (which depends neither on n nor on the distribution of X 1 ). Next, Steele [37] obtained via the Efron-Stein inequality the first upper bound on the variance of LC n proving, in particular, that: where p k = P(X 1 = α k ), k = 1, . . . , m. However, finding the order of the lower bound is much more illusive and remains unknown in many instances, in particular for iid uniform Bernoulli random variables. Some of the instances in which, and methods for which, a variance lower bound matching the linear upper bound have been obtained are further described below. Before doing so, let us state our main result: and (Y i ) i≥1 be two independent sequences of iid random variables with values in A m = {α 1 , α 2 , . . . , α m } and having the same law. Assume that Var LC n ≥ Kn, for some positive constant K independent of n ≥ 1. Let 0 < η < 1/10, then for all n ≥ 1, Recall, further, that if µ 2 is absolutely continuous, with respect to the Lebesgue measure, and with density µ 2 (dx)/dx essentially bounded, i.e., such that µ 2 (dx)/dx ∞ < +∞, then, e.g., see Ross [35] or the Appendix in [3]. Thus, Theorem 1.1 implies via (1.6), that and so, properly centered and normalized, LC n converges in distribution to a standard normal random variable as long as Var LC n is assumed to be of linear order.
Let us carefully review and discuss the assumption on the variance of LC n present in the statement of our main theorem. As indicated in (1.4), Var LC n ≤ n, however contradictory conjectures on the order of this variance have also appeared in the literature: A sub-linear conjecture (of order o(n 2/3 )) in [13] and a linear one in Waterman [39] (see also [2]). The linear order, which we believe to be the correct one, has been verified in a few situations that we briefly describe next: • This linear lower bound is proved in [28] or [22] for iid random variables (Bernoulli or finite-alphabet ones) which are highly biased, in that a single letter is taken with very high (but fixed) probability. In that case, changing in any configuration, a low probability letters into the high probability one, is more likely to increase LC n by one unit than to decrease it by one unit. This change (which clearly has no effect for uniformly distributed letters) reduces variability and the new longest common subsequences provide the variance lower bound.
• Beyond the strongly biased cases just mentioned, a linear order for the variance has been obtained in other situations closer to the iid uniform case. In particular, in a framework where either a letter is missing or long blocks are added within the iid uniform framework or in various other settings, as seen in the many references given in ( [4], [6], [18], [21], [23], . . . ). . Within these frameworks, modifications of the tools presented in our current approach would also lead to a central limit theorem, without any further assumption on the variance. In all these situations, the central r-th, r ≥ 1, moments of LC n can also be shown to be of order n r/2 (see the concluding remarks in [22]). This last fact might hint at the asymptotic normality of LC n , although similar moments estimates can lead to a non-Gaussian limiting law in a related problem, i.e., in the study of LCI n , the length of the longest common and increasing subsequences of two random words, over a totally ordered finite alphabet (see [8], [16]).
• Early extensive simulations (with n of order 10 4 ) by Boutet de Monvel [7] seemed to indicate, in the uniform case, a variance of order at least n 2ω with ω ≈ 0.418 and even a normal asymptotic law. More recent extensive simulations (with n of order 10 6 ) (see [29]) seem to indicate (in both the uniform and non-uniform binary cases) that the variance is of order n as the lengths of the sequences studied there are the larger to date, an order one-hundred times bigger than the ones in [7].
• As it will become clear from the proof of the theorem just stated, a mere sublinear lower bound on the variance will also lead to a normal limiting law, e.g., a lower bound of order at least n 9/10+η , η > 0 will do (although, and again, it is our belief that the variance of LC n is linear in n, but nevertheless 9/10 > 2ω ). Note also that the proof of this theorem provides for α (to be defined) such that 4/5 < α < 1, a rate of 1/n (1−α)/2 , while for 2/3 < α < 4/5, a different rate, of order 1/n 1−3(1−α/2)/2 , can be obtained in a similar way (see (2.40)), under a linear variance lower bound. Remark 1.1 Theorem 1.1 is the first of its kind. It contrasts, in particular, with the corresponding result in the related Bernoulli matching problem where, as shown by Majumdar and Nechaev ([31]), the limiting law is the Tracy-Widom one. Both the LCS and Bernoulli matching models are directed last passage vertex/site percolation models with respectively dependent and independent weights, possibly explaining the different limiting laws. In both cases, the expectation is linear in n, but the variance in the Bernoulli matching problem is sublinear (of order n 2/3 ), while in our LCS case it is assumed linear. Let us describe how the LCS problem can be represented as a directed last passage percolation (LPP) problem with dependent weights. Indeed, let the set of vertices be V := {0, 1, 2, . . . , n} × {0, 1, 2, . . . , n}, and let the set of oriented edges E ⊂ V × V contain horizontal, vertical and diagonal edges. The horizontal edges are oriented to the right, while the vertical edges are oriented upwards, both having unit length. The diagonal edges point up-right at a π/4-angle and have length √ 2. Hence, where e 1 := (1, 0), e 2 := (0, 1) and e 3 := (1, 1). With the horizontal and vertical edges, we associate a weight of 0. With the diagonal edge from (i, j) to (i+1, j+1) we associate the weight 1 if X i+1 = Y j+1 and 0 (or −∞) otherwise.
In this manner, we obtain that LC n is equal to the total weight of the heaviest paths going from (0, 0) to (n, n). (Another directed LPP representation can be obtained via LC n = max π∈SI (i,j)∈π 1 {X i =Y j } , where SI refers to the set of all paths with strictly increasing steps, i.e., paths with both coordinates strictly increasing from a step to another, from (0, 0) to the East, x = n, or North, y = n, boundary. A third representation would be as above but where now the paths going from (0, 0) to (n, n) have either strictly increasing steps or North or East unit steps. Again to the strictly increasing steps the associated weight is 1 {X i =Y j } while to the North as well as to the East unit steps is associated a weight value of 0. As a final representation one could still proceed with strictly increasing paths but with the requirement that one ends the paths with a 1.) Note that the weights in our percolation representations are not "truly 2-dimensional" and, in our opinion, this could be a further reason for the order of magnitude of the mean, variance as well as the limiting law in the LCS problem to be different from other first/last passage-related models. Theorem 1.1 further contrasts with the corresponding limiting law for the length of the longest common subsequences in a pair of independent uniform random permutations of {1, . . . , n}. Indeed, in sequence comparison problems, the emergence of the Tracy-Widom distribution has sometimes been contemplated/speculated, e.g., see [1]. We show, in the last section of the present paper, that this is correct when analyzing the asymptotic behavior of the length of the longest common subsequences of two independent uniform random permutations of {1, . . . , n} (the expectation there is of order √ n and the variance of order n 1/3 ).
Finally, let us remark that some of the ideas/techniques developed to prove lower bounds on V arLC n have been further developed in the context of first passage percolation, providing, to date, the best lower bound available on the variance of the passage time (see [14]).
As far as the content of the paper is concerned, the lengthy next section contains the proof of Theorem 1.1, which is preceded by a discussion of some elements of its proof. Then, in the third section, various extensions and generalizations as well as some related open questions are discussed. In particular, the proof, that the length of longest common subsequences in two independent uniform random permutations of {1, . . . , n} converges to the Tracy-Widom distribution, is included there.
2 Proof of Theorem 1.1 The aim of this section is to provide a proof of the main theorem by a threestep method. The first step makes use of a relatively recent theorem of Chatterjee ([10]) on Stein's method (see [12] for an overview of the method, including Chatterjee's normal approximation results via exchangeable pairs); the second uses simple moment estimates for LC n derived from our lower bound variance assumption; and the third develops lengthy correlation estimates based, in part, on short string-lengths genericity results obtained in [24]. We start by fixing notation and recalling some preliminaries.
Throughout this section, X = (X i ) i≥1 and Y = (Y i ) i≥1 are two independent sequences whose coordinates are iid, with a common law, and taking their values in A m = {α 1 , α 2 , . . . , α m }, a finite alphabet of size m.
Let us continue by introducing some more notation following those of [10]. Let W = (W 1 , W 2 , . . . , W n ) and W = (W 1 , W 2 , ..., W n ) be two iid R n -valued random vectors whose components are also independent. For A ⊂ [n] := {1, 2, . . . , n}, define the random vector W A by setting with for A = {j}, and further ease of notation, W j is short for W {j} , while W ∅ = W . For a given Borel measurable function f : R n → R and A ⊂ [n], let where and again, T ∅ = n j=1 (∆ j f (W )) 2 . Finally, let where |A| denotes the cardinality of A, and where the sum above is taken over all the proper subsets (including T ∅ ) of [n]. Here is Chatterjee's result.
Theorem 2.1 [10] Let all the terms be defined as above, and let 0 < σ 2 := Var f (W ) < ∞. Then, where G is a standard normal random variable.
Remark 2.1 (i) In [10], the variance term as displayed in (2.1) is actually replaced by Var E(T |f (W )) but the above bound, with the larger Var T , already presented in [10], is enough for our purpose.
(ii) Our proof bounds the right-hand side of (2.1) and next, using (1.6), bounds the corresponding Kolmogorov distance. An alternate way to obtain convergence in distribution would be to first use a more recent result of Lachièze-Rey and Peccati [27], directly bounding the Kolmogorov distance, which could then be estimated by adapting the techniques presented below.
Two small comments are in order before beginning the proof of Theorem 1.1.
(1) In the proof, we do not keep track of the constants since doing so would make the arguments a lot lengthier. Therefore, a constant C may vary from an expression to another. Note, however, that C will always be positive and independent of n.
(2) We do not worry about having quantities (e.g. length of longest common subsequences of two random words) like n α , ln n, etc. which should actually be n α , ln n , etc. This does not cause any problems as we are interested in asymptotic bounds. The proof can be revised with minor changes (and some further notational burden) to make the statements more precise.
Let us start with a sketch of proof Theorem 1.1 and to do so, set and set We begin by estimating the second term on the right-hand side of (2.1). To do so, recall our assumption: Therefore, This last estimate takes care of the second term on the right-hand side of (2.1).
Next, let us move to the estimation of the variance term in (2.1). Setting Var T can be expressed as Our strategy is now to further divide S 1 into two main pieces by conditioning on a, yet to be defined, high probability event E n ,s 1 ,s 2 , ensuring that LCSs are made of an accumulation of relatively short strings. More precisely, To estimate each of the two terms in the above lemma, the following proposition, and a conditional version of it, which easily follows from similar arguments, will be used repeatedly throughout the proof.
Next, expressing (A,B,j,k)∈S * in terms of R, using basic results about binomial coefficients and performing some elementary manipulations lead to Hence, Var T ≤ n 2 giving a suboptimal result for our purposes, and we therefore begin a detailed estimation study to improve the variance upper bound to o(n 2 ).
To do so, we start by giving a slight variation of a result from [24] which can be viewed as a microscopic short-lengths genericity principle, and which will turn out to be an important tool in our proof. This principle, valid not only for common sequences but in much greater generality (see [24]), should prove useful in other contexts.
Assume that n = vd, and let the integers be such that with the understanding that this length is zero if none of the letters of the X-part are aligned with letters of the Y -part, i.e., if the X-part is only aligned with gaps). Next, let > 0 and let 0 < s 1 < 1 < s 2 , be two reals such that (See [24] for the existence of, and estimates on, s 1 and s 2 .) Finally, let E n ,s 1 ,s 2 be the event that for all integer vectors (r 0 , r 1 , ..., r d ) satisfying (2.11) and (2.12), we have ( 2.13) In words, E n ,p 1 ,p 2 is the (random) set of optimal alignments of X 1 · · · X n and Y 1 · · · Y n satisfying (2.11), for which a proportion of at least 1 − of the integer intervals [r i−1 + 1, r i ] N , i = 1, 2, . . . , d, have their length between vs 1 and vs 2 .
As stated next, E n ,s 1 ,s 2 holds with high probability. Broadly, our next theorem asserts that for any > 0, there exists v large enough, but fixed, such that if X is divided into segments of length v then, typically (at least a fraction 1 − of segments), and with high probability, the LCSs match these segments to segments of similar length in Y .
). Let the integer v be such that (2.14) Then, for all n = n( , δ) large enough.
Remark 2.2 In [24], instead of (2.11), the corresponding condition is: which is made up of strict inequalities becoming weak inequalities in (2.11).
The rationale for this difference is that, in general, there is no guarantee that there exists an optimal alignment, i.e., a longest common subsequence, satisfying both conditions (2.12) and (2.16). Indeed, for a simple counterexample, Then, any optimal alignment satisfying (2.12) must have a piece (soon to be called a "cell") with no terms in the Y -part and this is clearly incompatible with (2.16). (This counterexample can easily be extended to n = 6, , and so on.) In general, there always exists an optimal alignment (r 0 , r 1 , r 2 , ..., r d ) satisfying both (2.11) and (2.12) with, say, v = n α , 0 < α < 1, as above.
(Consider any one of the longest common subsequences and choose the r i 's so that these two conditions are satisfied.) Therefore, we slightly change the framework of [24] as forthcoming arguments require the existence of an optimal alignment satisfying (2.12) for any value of X and Y . However, the proof of Theorem 2.2, above, proceeds as the proof of the corresponding result (Theorem 2.2!) in [24], and is therefore omitted. (The only difference is that counting the cases of equality, an upper estimate on the number of integer-vectors (0 = r 0 , r 1 , . . . , r d−1 , r d = n) satisfying (2.11) is now given by  [24], one can easily verify that the following relation between n and is sufficient for (2.15) to hold: where 0 < δ < δ * := min(γ * m −γ(s 1 ), γ * m −γ(s 2 )) is a fixed positive quantity and K A is a positive constant such that γ * m n − K A √ n ln n ≤ ELC n . (One can find explicit numerical estimates on K A using Rhee's [33] proof of (1.3).) In our context, here is how to choose so that the estimate in (2.15) holds true for all n ≥ 1 and v = n α , 0 < α < 1. Let c 1 > 0 be a constant such that for all n ≥ 1. Setting, (2.14) holds for v = n α and therefore,
Let us return to the proof of Theorem 1.1, and the estimation of (2.7). First, for notational convenience, below we write S 1 in place of Σ (A,B,j,k)∈S 1 . Also, for random variables U, V and a random variable Z taking its values in R ⊂ R, and with another abuse of notation, we write Cov Let, now, the random variable Z be the indicator function of the event E n ,s 1 ,s 2 , where = c 1 (1 + ln (1 + v))/v, i.e., let Z = 1 E n ,s 1 ,s 2 , with v = n α and with c 1 as in Remark 2.3. Then, we arrive at the decomposition of Lemma 2.1 To estimate the first term on the right-hand side of (2.19), first note that For the second term on the right-hand side of (2.19), begin with the trivial bound on P(Z = 1) to get . (2.21) Finer decompositions are then needed to handle this last summation, and for this purpose, we specify an optimal alignment with certain properties.
In the sequel, r denotes such a (fixed) optimal alignment which also specifies the pairs, in the strings X 1 · · · X n and Y 1 · · · Y n , contributing to the longest common subsequence. 1 Such an alignment always exists, as just noted, and so we can define an injective map from (X 1 · · · X n , Y 1 · · · Y n ) to the set of alignments, making various definitions (such as the ones for S 1,1 and S 1,2 , below) well defined. This abstract construction is enough for our purposes, since the argument below is independent of the choice of the alignment. Note also that conditionally on the event {Z = 1}, r satisfies (2.13).
To continue, we need another definition and some more notation.
Definition 2.1 For the optimal alignment r, each of the sets is called a cell of r.
For example, focusing on W 8 = X 8 , we have Returning to the proof of Theorem 1.1, define the following subsets of S 1 with respect to the alignment r: Continuing with the decomposition of the right-hand side of (2.21), , (2.22) where to further clarify the notation note that, for example, To glimpse into the proof, let us stop for a moment to present some of its key steps. Our first intention is to show that, thanks to our conditioning on the event E n ,s 1 ,s 2 , the number of terms contained in S 1,1 is "small", while a further next step will be based on estimations for the indices in S 1,2 . Here we will observe that, as the letters are in different cells, we have enough independence (see the decomposition in (2.29)) to show that the contributions of the covariance terms from S 1,2 are "small".
Let us now focus on the first term on the right-hand side of (2.22). Letting g be as in (2.23), and using arguments similar to those used in the proof of Proposition 2.1, we have, where R = {(j, k) ∈ [2n] 2 : W j and W k are in the same cell of r}.
To estimate (2.24), for each i = 1, . . . , d, let |R i | be the number of pairs of indices (j, k) ∈ [2n] 2 that are in the ith-cell, and let T i be the event that s 1 n α ≤ r i − r i−1 ≤ s 2 n α . Then, For the first term on the right-hand side of (2.25), note that, when T i holds true, the X-part of the i-th cell can contain at most n α letters while the Y -part can contain at most s 2 n α ones. Thus, and this leads to: For the estimation of the second term on the right-hand side of (2.25), we first observe that letting I := {i ∈ [n 1−α ] : T i does not occur}, we have Noting that |R i | ≤ 4n 2 , we have Next, by definition, given that Z = 1, |I| ≤ n 1−α and so Thus, we obtain and when α > 2/3, the above right-hand side is o(n 2 ). Hence, using also (2.26), it follows that: which, in turn, yields via (2.24), This last estimate takes care of the first sum on the right-hand side of (2.22) as, again, this last right-hand side is o(n 2 ), when 2/3 < α < 1.
Hence, from here on, we henceforth assume that α is a real greater than 2/3 and smaller than 1.
We move next to estimating the second term on the right-hand side of (2.22), which is given by: .
To estimate the summands in the above expression, we decompose the covariance terms in such a way that (conditional) independence of certain random variables occurs, therefore simplifying the estimates themselves. For this purpose, for each i ∈ [2n], let f (P i ) = LC(P i ) be the length of the longest common subsequences of P 1 i and P 2 i , the coordinates of the cell where P i is the same as P i except that W i is now replaced with the independent copy W i . In words,∆ i f (W ) is the difference between the length of the longest common subsequences of the two random words forming P i , and the length of their modified versions at coordinate i, i.e., the words forming P i . Now for (A, B, j, k) ∈ S 1 , (2.29) where, for any i / ∈ A, we also set∆ i f ( with W A P i and W A∪{i} P i being the restrictions of W A and W A∪{i} to the cell P i , respectively. Above, we used the bilinearity of Cov Z=1,S 1,2 to express the left-hand side as a telescoping sum. (Except for the conditioning step, this decomposition is akin to a decomposition developed in [11].) Let us start by estimating the last term on the right-hand side of (2.29). Letting ξ j :=∆ j f (W )∆ j f (W A ) and ξ k :=∆ k f (W )∆ k f (W B ), with a slight abuse of notation as ξ j depends on A, while ξ k depends on B, we have Note that the second term on the right-most equality is, in absolute value, at most Ce −n 1−α (1+ln (1+n α )) , as from (2.18), P(Z = 1) ≥ 1 − 1/(2e) and P(Z = 0) ≤ e −n 1−α (1+ln (1+n α )) . So we focus on the other term and evaluate E((ξ j −Eξ j )(ξ k −Eξ k )1((A, B, j, k) ∈ S 1,2 )). We have and, by conditional independence, the above right-hand side is equal to E((ξ j −Eξ j ) | 1((A, B, j, k) ∈ S 1,2 )))E((ξ k −Eξ k ) | 1((A, B, j, k) ∈ S 1,2 )))P((A, B, j, k) ∈ S 1,2 ). | 1((A, B, j, k) ∈ S 1,1 ))P((A, B, j, k) ∈ S 1,1 ).
Combining these observations, and using again (2.18), P(Z = 1) ≥ 1 − 1/(2e), n ≥ 1, we obtain CP((A, B, j, k) ∈ S 1,2 )(P ((A, B,  We therefore arrive at: where for the last step (2.27) is used, as well as the estimates in (2.10) and (2.18). We continue by obtaining upper bounds for the first four summands in (2.29). We just focus on the estimation of the first of these four terms since the other three can be estimated in a similar way. Indeed, it will be clear from the discussion below that the third of these four terms can be estimated in exactly the same way as done for the first of the four. Also, with steps similar to the ones performed in estimating this first term, one can easily see that the estimation of the second and fourth of these terms reduces to the estimation of (Again, and throughout, E Z=1 is short for conditional expectation given {Z = 1}, while E Z=1,S 1,2 (·) = E Z=1 (·1 (A,B,j,k)∈S 1,2 ).) Next, . Now, writing S A 1,2 in place of S 1,2 when using the sequence W A instead of W , the last inequality, just above, leads to: . (2.31) Then, since the first term on the right-hand side of (2.31) is equal to which we will estimate further, below, when working out the estimation of the first term on the right-hand side of (2.31). Also, for the second term in (2.31), noting that |∆ j f (W ) −∆ j f (W )| ≤ 2, and that by the iid assumption W and W A are identically distributed, we have and then .
But, this last term on the right-hand side was already shown to be bounded by Cn 1+α +Cn 3−3α/2 (ln n α ) 1/2 , while reaching out (2.28). Therefore, focusing on the estimation of E S 1,2 |∆ j f (W ) −∆ j f (W )|/P(Z = 1) or, indeed, merely on the estimation of E S 1,2 |∆ j f (W ) −∆ j f (W )|, will suffice for our purposes for the second and the fourth of the terms in (2.29). This will be done while discussing the estimation of the first term below as noted earlier.
So, we can now focus on estimating the first term in (2.29) (and as already indicated, similar arguments will provide a similar estimate for the other three terms) which is given by: To do so, let so that we wish to estimate Cov Z=1,S 1,2 (U, V ). But, and note here that T i , i = 1, 2, 3, 4 are functions of (A, B, j, k). Let us begin by estimating (2.33) A similar estimate also reveals that Next, for T 3 and T 4 , and since |V | ≤ 1, where E Z=0 (resp. E Z=1 ) is short for the conditional expectation given {Z = 0} (resp. given {Z = 1}), and where we used the trivial bound on P(Z = 1), and also (2.18), for the last inequality. Now, denote by h(A, B, j, k) the sum of the first four terms on the righthand side of (2.29). Then, performing estimations as in getting (2.33), (2.34) and (2.35), for the first and third term of this sum, and keeping in mind the discussion following (2.31), so that similar estimates also hold true for the second and fourth term of the sum, we obtain .
Noting that the sums involving k's are identical to the sums involving j's, we rewrite this last upper bound as .
As with previous computations, using (2.10) and (2.18), the third sum on the above right-hand side is itself upper-bounded by .
Noting that we can just focus on estimating E Z=1 |∆ j f (W ) −∆ j f (W )|. To do so, the following simple proposition will be useful.

Proposition 2.2 For any j ∈ [2n]
, Proof. Assume not, and that ∆ j f (W ) > ∆ j f (W ). Then either ∆ j f (W ) = 1 and ∆ j f (W ) = 0, or ∆ j f (W ) = 0 and ∆ j f (W ) = −1. Consider the former. Then, changing the jth coordinate does not affect the length of the longest common subsequence of the cell containing j. Since the coordinates outside that particular cell have not been changed, the overall length of the longest common subsequence cannot decrease, that is, ∆ j cannot be 1. The other case is similar.
Returning to the estimation of E Z=1 |∆ j f (W ) −∆ j f (W )|, using the domination property obtained in Proposition 2.2, we have We now claim that both terms on the right-hand side of the last expression are exponentially small in n. Let us first deal with E Z=1 (∆ j f (W )), the other term, which is similar, is dealt with afterwards.
We have where Z j is the indicator random variable defined in the same way as Z, except that the j th coordinate of W is replaced by the independent copy W j . Note that, for any j ∈ [2n], Z and Z j are identically distributed but that they are certainly not independent. Looking, first, at the second term in the last expression, we have, with the help of (2.18), and since Z and Z j are identically distributed, Also, writing − E(f (W j )1(Z = 1)1(Z j = 1)) = 0, since, again, Z and Z j are identically distributed. These observations yield Similarly, noting that the expectation is conditional on Z = 1, replacing n by n α , we have 1+log(1+n α 2 )) .
(The reason for this last inequality is the fact that the configurations belong to S 1,2 and, in that case, we just deal with a scaled version of the LCS problem.) Now, note that and, via Proposition 2.1, which, when combined, yields Thus, from (2.38) and the above estimates, Var T ≤ C n 2 e −n 1−α (1+ln (1+n α )) + n 1+α + n 3−3α/2 (ln n α ) 1/2 . (2.40) Therefore, Theorem 2.1 and (2.5), ensure that: , holds for every n ≥ 1, with C > 0 a constant independent of n, and for α > 4/5 as then 1 + α > 3 − 3α/2. (ii) Of course, there is no reason for our rate 1/n (1−α)/2 to be sharp (as previously mentioned, for 2/3 < α < 4/5, the rate 1/n 1−3(1−α/2)/2 is possible). Also, instead of the choice v = n α , a choice such as v = h(n), for some optimal function h would improve this rate. Can we conjecture that the optimal rate in Kolmogorov distance is 1/ √ n? (iii) From a known duality between the length of a longest common subsequence of two random words and the length of a shortest common supersequence (see Dancík [15]), our result also implies a central limit theorem for this latter case.

Concluding Remarks
We conclude the paper with a brief discussion on longest common subsequences in random permutations and, in a final remark, present some potential extensions, perspectives and related questions we believe are of interest. Theorem 1.1 shows that the Gaussian distribution appears as the limiting law for the length of the longest common subsequences of two random words. However, the Tracy-Widom distribution has also been hypothesized as the limiting law in sequence comparison problems, e.g., [1]. It turns out, as shown next, that it is indeed the case for certain distributions on permutations.

2)
where = d denotes equality in distribution.
Clearly, the identity (3.3), which, in fact, is easily seen to remain true if ρ is a random permutation in S n with an arbitrary distribution, shows that the probabilistic behavior of LC n (ρ, π) is identical to the probabilistic behavior of LI n (π). Among the many results on LI n (π) presented in Romik [34], the mean asymptotic result of Vershik and Kerov [38], and Logan and Shepp [30] thus implies that (is equivalent to): Moreover, the distributional asymptotic result of Baik, Deift and Johansson [5] implies that (is equivalent to), as n → +∞, LC n (ρ, π) − 2 √ n n 1/6 −→ F 2 , in distribution, where F 2 is the Tracy-Widom distribution whose cdf is given by where u is the solution to the Painlevé II equation: To finish, let us list a few venues for future research that we find of potential interest. Remark 3.1 (i) First, the methods of the present paper can also be used to study sequence comparison with a general scoring functions S. Namely, S : A m ×A m → R + assigns a score to each pair of letters (the LCS corresponds to the special case where S(a, b) = 1 for a = b and S(a, b) = 0 for a = b). This requires more work, but is possible, and is presented in a separate publication (see [17]), where multiple words are also tackled. Such a result requires, at first, to use variance estimates, generalizing [23], as stated in the concluding remarks of [22] and then to extend to higher dimensions the closeness to the diagonal results obtained in [24].
(ii) Challenging, is the the loss of independence both between and inside the sequences and the loss of identical distributions both within and between the sequences. Results for this type of frameworks will also be presented elsewhere. Already for hidden Markov models (HMM), convergence results, with rates, are obtained for ELC n /n in [19], while [20] shows how to transfer iid normal approximation results such as Theorem 2.1 to the HMM case.
(iii) It would, similarly, also be of interest to study the random permutations versions of (i) and (ii) above. As in the previous section, and as far as the multiple sequences framework is concerned, the study of the length of the longest common subsequences reduces to the study of the length of the longest common and increasing subsequences with one less sequence, e.g., see [25].