Maximal Arithmetic Progressions in Random Subsets

Let U(N) denote the maximal length of arithmetic progressions in a random uniform subset of {0,1}^N. By an application of the Chen-Stein method, we show that U(N)- 2 log(N)/log(2) converges in law to an extreme type (asymmetric) distribution. The same result holds for the maximal length W(N) of arithmetic progressions (mod N). When considered in the natural way on a common probability space, we observe that U(N)/log(N) converges almost surely to 2/log(2), while W(N)/log(N) does not converge almost surely (and in particular, limsup W(N)/log(N) is at least 3/log(2)).


Introduction and Statement of Results
In this note we study the length of maximal arithmetic progressions in a random uniform subset of {0, 1} N . That is, let ξ 1 , ξ 2 , . . . , ξ N be a random word in {0, 1} N , chosen uniformly.
Consider the (random) set Ξ N of elements i such that ξ i = 1. Let U (N ) denote the maximal length arithmetic progression in Ξ N , and let W (N ) denote the maximal length aperiodic arithmetic progression (mod N ) in Ξ N . A consequence of our main result (Theorem 1) is that the expectation of both U (N ) and W (N ) is roughly 2 log N/log 2, twice the expectation of the longest run in Ξ N , see [3], [4]. We also show that the limit law of the centered version of both W (N ) and U (N ) is of the same extreme type as that of the longest run in Ξ N .
We observe two interesting phenomena: • Theorem 1 states that the tails of the distribution of W (N ) behave differently for positive and negative deviations from the mean. In particular, the probability that W (N ) deviates by x from its mean, behaves roughly like 1−exp −2 −(x+2) for positive x, and like exp −2 −(x+2) for negative x. Thus, on the positive side of the mean the tail decays exponentially, and on the negative side of the mean the tail decays doubly-exponential.
• One may construct the sets Ξ N on the same probability space by considering an infinite sequence of i.i.d., Bernoulli random variables {ξ i } ∞ i=1 . Proposition 2 states that with such a construction, the sequence W (N ) /log N converges in probability to the constant 2/log 2, but a.s. convergence does not hold. This contrasts with the behavior of U (N ) , where a.s. convergence of U (N ) /log N to 2/log 2 holds. The seemingly small change of taking arithmetic progressions that "wrap around" the torus, changes the behavior of the lim sup of the sequence.
The notoriously hard extremal problem, showing that a set of integers of upper positive density contains unbounded arithmetic progressions, and its finite quantitative versions, is a well studied topic reviewed recently in [7].

Results
Throughout, we set C = 2/ log 2. Our first main result is the following extreme type limit theorem.
Similarly, let {y N } be a sequence such that C log N − log(2C log N ) + y N ∈ Z for all N , and In particular, both W (N ) / log N and U (N ) / log N converge in probability to C.
The dichotomy in the sequential behavior of W (N ) and U (N ) is captured in the following proposition.
In particular, W (N ) /C log N does not converge a.s. to 1.
The structure of the note is as follows. In the next section, we introduce dependency graphs and the Arratia-Goldstein-Gordon version of the Chen-Stein method, and perform preliminary computations. After these preliminary computations are in place, the short Section 3 is devoted to the proof of Theorem 1. Section 4 is devoted to the proof of Proposition 2.

Preliminaries and auxilliary computations
We introduce the notion of dependency graphs, and the method of Chen and Stein to prove Poisson convergence, that will play an important role in our proof.

Dependency Graphs
Let X 1 , X 2 , . . . , X N be N random variables. Let G be a graph with vertices 1, 2, . . . , N . We use the notation i ∼ j to denote two vertices connected by an edge. As X i is not independent of itself, we define i ∼ i for all i (this can be thought of as requiring G to have a self loop at The notion of dependency graphs has been introduced in connection with the Lovásc Local Lemma, see [1], Chapter 5. Some other results concerning dependency graphs are [5], [6]. We emphasize that there can be many dependency graphs associated to a collection of random We define two quantities associated with a dependency graph G of The following is a simplified version of Theorem 1 in [2], which in turn is an effective way to apply the Chen-Stein method: , and define B 1 and B 2 as in (4) and (5). Let Z be a Poisson random variable with mean E [Z] = λ. Then, for any A ⊂ N, Theorem 3 is useful in proving convergence of sums of "almost" independent variables to the Poisson distribution.

Auxilliary Calculations
and W ′ = max s,p W ′ s,p . That is, we take truncated versions of W s,p and W .
and set be the arithmetic progression corresponding to I s,p .
Let G be the graph with vertex set {(s, p)} N s,p=1 , and edges defined by the relations Fix x ∈ R such that x < ε log N (for large enough N this is always possible). Note that Define D s,p (k) to be the number of pairs t, q with q = p such that |A(s, p) ∩ A(t, q)| = k.
The following combinatorial proposition proves to be useful.
Proposition 4. For all s, p the following holds: We have the following constraints: Since there are at most M k−1 choices for a and for b, and since a choice of a, b determines q, we have at most M 2 (k−1) 2 choices for q. Remark. This can be improved to 2 k+1 · M 2 (k−1) 2 , with a slightly more careful analysis. We will not need this improvement.
Since t = x 1 − iq = s + jp − iq for some 0 ≤ i, j ≤ M , there are at most (M + 1) 2 choices for t, once we have fixed q.
Fix s, p. There is at most one value of t such that |A(s, p) ∩ A(t, p)| = k. Hence, the number of pairs t, q such that |A(s, p) ∩ A(t, q)| = k is at most D s,p (k) + 1 Thus, for all δ > 0.
For s, p and t, q such that |A(s, Also, if q = p and A(s, p) ∩ A(t, q) = ∅, then either t ∈ A(s, p) or s ∈ A(t, q). Thus, if t = s, Hence, for all δ > 0. ⊓ ⊔

Arithmetic Progressions: Proof of Theorem 1
Since the proofs are very similar, we only consider the slightly harder W (N ) . We write W for W (N ) whenever no confusion can occur.
We begin with the following lemma: Lemma 6. The sequence W (N ) /C log N converges to 1 in probability; i.e. for any δ > 0, Further, the convergence is almost sure on the subsequence N k = 2 k . Finally, the statements hold with U (N ) replacing W (N ) .
Proof of Lemma 6. Again, we consider only W (N ) . Fix ε > 0. Note that Thus, Now let x = −ε log N , and let Z(x) be a Poisson random variable with mean Note that {W ≤ (C − ε) log N } implies that {W ′ ≤ (C − ε) log N }, so using Theorem 3 and Proposition 5, for ε < 1 2 log 2 . So for any positive δ < 1 4 , we get from (7) and (8) that Further, from the same estimates one has that with Y k = W (2 k ) /C log(2 k ), for any positive One then deduces from the Borel-Cantelli lemma the claimed almost sure convergence. ⊓ ⊔ Proof of Theorem 1. As in the proof of Lemma 6, for x ∈ R, let Z(x) be a Poisson random variable with mean Note that W ′ > C log N + x iff S(x) > 0. By Theorem 3 and Proposition 5, We also have the equality Thus, for 0 < δ < 1, Let {x N } be a sequence such that C log N + x N ∈ Z for all N . If inf N x N ≥ b ∈ R, then exp (λ(x N )) is a bounded sequence. Thus,

Convergence in Probability vs. a.s. Convergence
We begin with the following easy consequence of Lemma 6 applied to U (N ) .
Proof. The main observation is that U (N ) is a monotone increasing sequence. That is, a.s. for all N , U (N ) ≤ U (N +1) . Thus, a.s. for all N , setting k = ⌊log 2 N ⌋, we have Since U (2 k ) /log(2 k ) converges a.s. to C and log(2 k )/log(2 k+1 ) converges a.s. to 1, we get a.s.

We turn to the
Proof of Proposition 2. In view of Proposition 7, it remains only to consider the statement concerning W (N ) . Toward this end, fix 0 < β < 1.
That is, I(s, p, N ) is the indicator function of the event that ξ s = 0 and Thus, the proof of the lemma is based on controlling the cardinality of the collection of triples (s ′ , p ′ , N ′ ) ∈ L n , whose associated arithmetic progression intersects in a prescribed number of points the arithmetic progression associated with a given triple (s, p, N ) ∈ L n . We divide our estimates into three: intersection at one point, intersection at two points or more, and intersection at 2C log(2n)/5 points or more.
For N, N ′ ∈ [n, 2n] and p ∈ [1, N ] define T (N, N ′ , p) to be the set of all triples (s, s ′ , p ′ ) such Similarly, define S(N, N ′ , p) to be the set of all such triples (s, s ′ , p ′ ) such that Finally, define U(s, p, N ) the set of all triples (s ′ , p ′ , N ′ ) ∈ L n such that We have the following estimates. Thus, Since By the Borel-Cantelli lemma, we get that a.s.

⊓ ⊔
Proof of Proposition 9. If (s, s ′ , p ′ ) ∈ T (N, N ′ , p) then (10) implies that there exist i ∈ There are at most (2n) 2 choices for s ′ and p ′ . There exists some universal constant K such that there are at most K log(n) choices for each of i, j, k i , k ′ j . Choosing s ′ , p ′ , i, j, k i , k ′ j determines s. Thus, we have shown that |T (N, N ′ , p)| ≤ 4Kn 2 log 4 (n) ≤ n 2 log 5 (n) for large enough n.
Note that for any i ∈ [1, Similarly, for any j ∈ [1, Plugging this into (18), and subtracting equations, we get that there exist There exists some universal constant K such that there are at most K log(n) choices for each of i, r, j, ℓ, k i , k r , k ′ j , k ′ ℓ , and 2n choices for s. After choosing i, r, j, ℓ, k i , k r , k ′ j , k ′ ℓ , s, (18) and (19) determine s ′ and p ′ . Thus, we have shown that for large enough n, |S(N, N ′ , p)| ≤ 2n (K log(n)) 8 ≤ n log 9 (n).
For i = 1, 6, 11, . . ., let This is a partition of the arithmetic progression into packets of five elements. We then have, by the definition of U(s, p, N ), So there exists some set Z i such that |Z i | ≥ 2. This implies that there exist x < y ∈ A∩[1, N ′ ], i ∈ [1, M N ′ ], and r ∈ [1, 4] such that s + (i + r)p ′ (mod N ′ ) = y.
Subtracting equations, and using the fact that rp ′ < N ′ , we get that Moreover, (12) also implies that there must exist an integer j (perhaps negative) with 1 5 M N ′ ≤ |j| ≤ M N ′ , and z ∈ A ∩ [1, N ′ ], such that s + (i + j)p ′ (mod N ′ ) = z.
Since kr = 0, equations (21) and (23) have at most one solution for p ′ , N ′ , in terms of x, y, z, r, j and k. Since there are at most |A| 3 ≤ M 3 N choices for x, y and z, at most 4 choices for r, and at most 4M 2 N ′ choices for j and k, we get that there are at most 16|A|