Deviation probabilities for arithmetic progressions and irregular discrete structures

Let the random variable $X\, :=\, e(\mathcal{H}[B])$ count the number of edges of a hypergraph $\mathcal{H}$ induced by a random $m$-element subset $B$ of its vertex set. Focussing on the case that the degrees of vertices in $\mathcal{H}$ vary significantly we prove bounds on the probability that $X$ is far from its mean. It is possible to apply these results to discrete structures such as the set of $k$-term arithmetic progressions in the $\{1,\dots, N\}$. Furthermore, our main theorem allows us to deduce results for the case $B\sim B_p$ is generated by including each vertex independently with probability $p$. In this setting our result on arithmetic progressions extends a result of Bhattacharya, Ganguly, Shao and Zhao \cite{BGSZ}. We also mention connections to related central limit theorems.


Introduction
Let W 3 be the number of 3-term arithmetic progressions in a random set B p ⊆ [N] = {1, . . ., N} in which each element is included independently with probability p.What can be said about the upper tail of the random variable W 3 ?This question has been extensively studied in recent years and we shall use a discussion of this problem to motivate our results.
Given sequences p = p N and δ = δ N we may define a rate associated with the corresponding deviation by r(N, p, δ) := − log P (W 3 The case that δ > 0 is a fixed constant is known as the large deviations regime.The asymptotic value r(N, p, δ) = (1 + o(1))δ 1/2 p 3/2 N log(1/p) was obtained by Harel, Mousset and Samotij [13] for all δ > 0 and across the whole range of densities N −2/3 (log N) 2/3 ≪ p ≪ 1.This improved on earlier results of Bhattacharya, Ganguly, Shao and Zhao [5] (which covered the cases p N −1/36 (log N) 1/3 ) and Warnke [25] (which gave the value of r(N, p, δ) up to a multiplicative constant).
In this context our contribution is to prove the same asymptotic value provided log N/pN ≪ δ N ≪ p 1/2 .
The following figure illustrates these regions.We consider here the cases p = N γ and δ N = N θ , where γ, θ 0.
We remark that the form of the expression for r(N, p, δ) is given by (1+o(1))δ 2  N E [W 3 ] 2 /2Var(W 3 ) in both the Normal and Poisson regimes.The only distinction is that the leading term in the variance is (1 + o(1))7p 5  Our results also apply to the lower tail.It would be of interest to determine the asymptotic rate for the lower tail in general.The order of magnitude Θ(1) min{p 3 N 2 , pN} may be obtained using Harris' inequality [14] and Janson's inequality, see Theorem 2.14 of [17].See [19] and [22] for more detailed discussions of lower tail problems.The grey lines on the left represent deviations of the order of the standard deviation and so are covered by the central limit theorem [3].The blue areas ought to belong to the "Normal" regime in which we would expect that r(N, p, δ) to be of the form (1 + o(1))3δ 2 N pN/56(1 − p).Our results show this in the light blue and turquoise regions, while Bhattacharya, Ganguly, Shao and Zhao [5] covered the turquoise and dark blue regions.The question remains open in the striped regions.In the red striped region, the "Poisson" regime, we would expect r(N, p, δ) In the green striped region, the "localisition" regime, we would expect r(N, p, δ) = (1 + o(1))2δ N p 3/2 N log(1/p).The known cases in this regime are also due to Bhattacharya, Ganguly, Shao and Zhao [5] and are represented by the dark green triangle.The right hand side of the figure represents large deviations which have been resolved by Harel, Mousset and Samotij [13] as discussed above.We have not included Warnke's moderate deviation results (which give the value of r(N, p, δ) up to a multiplicative constant) in the figure, however we have marked the range of δ covered by his results using an orange line on the θ-axis.

Moderate deviations in the model B p in general
We shall prove results analogous to those discussed above in a much more general setting.Given a k-uniform hypergraph H with vertex set [N] we may define the random variable which counts the number of edges of a hypergraph H in the random subset B p ⊆ [N].Many results on deviation probabilities apply in this setting.In particular, the incredibly useful and versatile inequality of Kim and Vu [20] is now a fundamental tool of probabilistic combinatorics.
Many results have focussed on large deviations.Janson and Ruciński [18] determined (under certain conditions) the log probability log(P (X > (1 + δ)E [X])) up to a factor of order log(1/p).Warnke [26] determined (under certain conditions) the log probability up to a constant factor.And in recent work Bhattacharya and Mukherjee [4] proved that a framework for studying large deviations, introduced by Chatterjee and Varadhan [8] (in the context of subgraph counts), may also be used in this setting, and serves as a basis for analysing questions such as symmetry breaking.
Our results in this setting depend on some basic parameters of the hypergraph.Let us set the number of edges of H in the set B, and are the average degree and degree variance respectively.
We may extend the definition of degree to sets, so that d H (R) is the number of edges containing the set R. We then define ∆ r (H) to be the maximum r-degree of H, i.e., ∆ r (H) := max{d H (R) : |R| = r} .
Our conditions on H will simply be that, for some r 2, we have ∆ r (H) = O(1) and σ 2 (H) = Ω(N 2r−2 ).In particular this asks that the standard deviation of degrees be of the same order N r−1 as the maximum degree (as We may now state our main result in this setting.In fact it makes no difference to the proof to state the result for weighted hypergraphs in which a positive weight is associated to each edge.Naturally, all parameters count edges with weights.For example e(H) then refers to the sum of all weights and a degree d H (x) refers to the sum of weights of edges containing x.
Given a sequence p = p N which may be a constant in (0, 1/2) or converge to 0, let δ N be a sequence such that 1) . Then Furthermore the same result holds for the lower tail probability In fact we shall work almost exclusively in another model B m , a uniformly random melement subset of [N].Theorem 1.1 will be a consequence of a similar theorem for the B m model, see Theorem 1.2, together with standard results about the tail of the binomial distribution.

Moderate deviations in the model B m in general
We now consider the uniform model B ∼ B m , where B m is a uniformly random m-element subset of [N].It has been less common to work with this model, however we remark that Warnke [26] shows that his results also hold in the B m model.Furthermore, moderate deviations were studied recently in the B m model for hypergraphs which are regular or close to regular [23].
We define to be the deviation of N H (B m ) from its mean.
We set t := m/N to be the density of the random set.The result makes sense with t ∈ (0, 1/2] a constant or a sequence Furthermore the same holds for the lower tail probability Remark 1.It is natural to ask whether the interval of deviations is best possible.The lower bound is necessary, except perhaps for the (log N) 1/2 , as t k−1/2 N r−1/2 is the order of the standard deviation.The upper bound is essentially best possible in the case r = k, but not for r < k.In particular, we believe that it may be extended by an extra poly log factor by making changes to Section 3, for example, using Warnke's inequality [26] instead of using a moment bound similar to that of Janson and Ruciński [18].However, it may well be possible to extend the interval even further, perhaps as far as t r(2k−1)/(2r−1) N r , which would be best possible (up to polylog terms).
A discussion of the proof of Theorem 1.2 It is clear that the variance of the degrees plays a central role.In particular, the more variance the more likely deviations become.This is because deviations are driven by the degrees of the vertices selected.It may seem counterintuitive that deviations of the random variable D H (B m ) which in principle could depend on many properties of the hypergraph H in the end depend almost entirely on the degree distribution of H. However we will make this assertion concrete by showing that D H (B m ) is generally very well approximated by a process which considers only the degrees.
Let us now define this "degree" process.We may suppose that B m is generated as B m = {b 1 , . . ., b m } where b 1 , . . ., b N is a uniformly random permutation of the elements of [N].We set t := m/N and s := i/N.Let us define where X H 1 (B i ) is defined to be the deviation of the degree of the i th element b i from its conditional expectation, i.e., The proof of Theorem 1.2 now partitions naturally into two tasks.First, we must show that D H (B m ) is generally very close to Λ H (B m ).This is achieved by considering the full martingale representation for D H (B m ) given in Section 2.1 (and previously in [23]) and showing that all the terms X H ℓ (B i ) which occur are predictable in terms of X H 1 (B i ).It is in proving this approximation that we make use of previous large deviation results, in particular we use/adapt the result of Janson and Ruciński [18].
The second task is to prove a deviation result for Λ H (B m ) (Proposition 7.1).We do so using Freedman's inequalities for martingale deviations.This turns out to be relatively straightforward.We simply need to control the contributions E X H 1 (B i ) 2 |B i−1 to the quadratic variation (see Section 6).

Returning to arithmetic progressions
Let us now state more formally our result for 3 term arithmetic progressions.Let H 3 be the 3-uniform hypergraph with vertex set [N] and edges corresponding to 3-APs in [N].It follows from simple calculations (see [24]) that It is also easy to verify that ∆ 2 (H 3 ) = O(1) and so we can apply Theorem 1.1 with r = 2 to obtain the following result.
Corollary 1.3.Given a sequence p = p N which may be a constant in (0, 1/2) or converge to 0, let δ N be a sequence satisfying Then Results for longer arithmetic progressions and another example (additive quadruples) are given in Section 9.

Central limit theorems
Let us mention that our results also give central limit theorems for D H (B m ) and D H (B p ).We remark that the central limit theorem (and even a bivariate version) is already known for the special case of arithmetic progressions [3].However, we are not aware of a central limit theorem in the more general context of random induced subhypergraphs.We state first our result for D H (B m ).
Since it is not the focus of the article let us mention now how one may prove the theorem using results of this paper.Classic papers on the Martingale central limit theorem include [6], [15] and [9].Although it is also worth consulting more recent results such as [21], and the references therein.
One may easily prove that Λ H N (B m )/σ 1 converges in distribution to a standard Gaussian using some version of the martingale central limit theorem together with Proposition 6.1 to control the quadratic variation of the process.The result then follows immediately from the fact that the difference between D H (B m )/σ 1 and Λ H (B m )/σ 1 converges to 0 in probability by Proposition 5.1.
One may then easily deduce the corresponding result for the binomial B p model.
Then for every sequence p = p N which may be a constant in (0, 1/2) or converge to 0 while satisfying p ≫ (log N/N) (r−1)/(k−1) , we have By standard properties of the normal distribution it suffices to show that D H N (B p ) may be expressed as a sum It is easy to find such a representation, defining the random variable M = |B p |, the number of elements in the random set B p , we may simply decompose D H (B p ) as follows The first bracketed quantity is exactly D H (B m ) for the corresponding value m of M, it is easily verified (using Theorem 1.4) that dividing by σ 1 this converges in distribution to a standard Gaussian.The second bracketed quantity corresponds to L H (M) − p k e(H).Using the fact that M ∼ Bin(N, p) is asymptotically normally distributed it is easily verified that (L H (M)−p k e(H))/σ 2 also converges in distribution to a standard Gaussian, as required.

Notation
Let us emphasise that t denotes m/N, the density of the random set B m ⊆ [N], throughout the article.Likewise s denotes i/N.

Overview of the paper
In Section 2 we give important auxiliary results and well known inequalities which will be useful throughout the article.In Section 3 we adapt to our context the approach of Janson and Ruciński [18] to bound large deviations.And at the end of Section 3 we state a corollary of these results for link hypergraphs.Using this corollary we show in Section 4 that X H ℓ (B i ) may be well approximated by a multiple of X H 1 (B i ).This result will be sufficient to show that D H (B m ) is well approximated by Λ H (B m ) in Section 5.This represents the first major task in proving Theorem 1.2.
In Section 6 we study the quadratic variation of the martingale representation of Λ H (B m ).Finally, in Section 7 we prove deviation bounds for Λ H (B m ), making use of our control of the quadratic variation, and we deduce Theorem 1.2.In Section 8 we show how Theorem 1.1 for the model B ∼ B p follows.In Section 9 we give applications of Theorems 1.2 and 1.1 to arithmetic progressions and solutions of the Sidon equation in {1, . . ., N}.

Preliminaries and tools
In this section we lay out some preliminary results which we will rely on in the rest of the paper.In Section 2.1 we state a martingale representation for the deviation D H (B m ) given first in [23] (see also [12] for a similar result in the context of subgraph counts).We may now count the evolution of the number of edges of H contained in B i for i m.In fact it is useful to also consider partially filled edges.For 1 ℓ k we let We shall study the one step increment of this quantity.Set ) .Finally we may define centered versions of these increments: We may now state the martingale representation of D H (B m ) given in [23].
Lemma 2.2 (Hoeffding-Azuma inequality).Let (S i ) m i=0 be a martingale with increments (X i ) m i=1 , and let c i = X i ∞ for each 1 i m.Then, for each a > 0, Furthermore, the same bound holds for P (S m − S 0 < −a).
Probabilistic intuition would suggest that variance (or rather quadratic variation) should be more relevant than i X i 2 ∞ .Freedman's inequality [11] allows us to replace i X i 2 ∞ by (essentially) the quadratic variation of the process up to that point plus a term which is often negligible.Lemma 2.3 (Freedman's inequality).Let (S i ) m i=0 be a martingale with increments (X i ) m i=1 with respect to a filtration (F i ) m i=0 , let R ∈ R be such that max i |X i | R almost surely, and let Then, for every α, β > 0, we have Freedman also proved a converse for this inequality, which requires some new notation.Define the stopping time τ α to be the least j such that S j > S 0 + α and let The following is the converse of Freedman's inequality [11].

A large deviations bound and application to link hypergraphs
The main aim of this section is to prove a bound on the probability of a deviation in link hypergraphs H(x) of a hypergraph H, see Corollary 3.3.Since H(x) is in some sense "just some hypergraph" we obtain the bound by simply applying the following proposition which bounds the probability of large deviations.The proposition is proved using Janson's inequality for the lower tail and a moment argument similar to that of Janson and Ruciński [18] for the upper tail.
It will be useful to allow the edges of the hypergraph to have a non-negative weight.Given such a weighted hypergraph we naturally include the weight in all related parameters, so that e(H) denotes the sum of all weights, the degree of a vertex is interpreted as the sum of weights of edges containing that vertex, and, for example N H (B m ) is the sum of the weights of edges of H contained in B m .Proposition 3.1.Let 1 r k and C be integers and let ε > 0, then there is a constant c = c(k, C, ε) > 0 such that the following holds.Let H be a k-uniform (weighted) hypergraph with ∆ r (H) C. Let t N −r/k , and let m = tN.Then, for all i m, we have Remark 2. A stronger bound with t k/r N replaced by min{t k N r , t k/r N log N} ought to follow by instead using Warnke's results [26] to bound the upper tail.Doing so would allow one to extend the intervals of deviations considered in Theorem 1.2 and Theorem 1.1 by a polylog factor.
We now state a lemma which bounds moments of the count of edges of H in the random set.In fact we work in the B p model, in which each elements of [N] is included in B p independently with probability p.The same result holds in the B m model but it is marginally easier to prove in the B p model.The result is different only in a minor way from the bound proved by Janson and Ruciński [18], and we use essentially the same proof.Lemma 3.2.Let 1 r k and C be integers and let ε > 0, then there is a constant c = c(k, C, ε) > 0 such that the following holds.Let H be a k-uniform (weighted) hypergraph with ∆ r (H) C. Let t ∈ (0, 1).Then, for all p t, we have for all positive integers ℓ ct k/r N, where µ := p k e(H).
Proof.We begin by proving that for any set U of at most kℓ elements of [N] we have provided ℓ ct k/r N, for a sufficiently small small constant c = c(k, C, ε) > 0.
Let us count how many edges of H have certain intersections with the set U. We begin by bounding the number of edges e with |e ∩ U| r.By our condition that ∆ r C, there are clearly at most C kℓ r Ck r ℓ r such edges.For 1 j < r we have ∆ j N r−j ∆ r CN r−j and so there are at most kℓ j CN r−j Ck j ℓ j N r−j edges e with |e ∩ U| = j.Finally, we simply use e(H) as the upper bound for the number of edges not intersecting U.
Armed with these bounds we have that where we assume in the last line that our choice of c will be at most 1.The exact form of the argument now depends on the value of µ = p k e(H).
If µ εt k N r /4 then we have If µ εt k N r /4 then we have The lemma now follows by a straightforward induction argument, as Note that the last inequality was obtained using our bound (3).
We now deduce Proposition 3.1.
Proof of Proposition 3.1.Let us set s := i/N.Note that s t = m/N.Throughout the proof we set µ = E N H (B s ) = s k e(H).
It may be easily verified (using Stirling's approximation for example) that P (Bin(N, s) = i) 1/N, and so it suffices to prove the bounds and ) for some c = c(k, C, ε) > 0. We have used here that the difference between the means in the B i and B s models is O(t k−1 N r−1 ) and so is negligible.
We begin by proving (4), the bound on the lower tail.We do so using Janson's inequality, see Theorem 2.14 of [17].We note that the bound on ∆ r gives us that ∆ j CN r−j for all 1 j r and ∆ j C for all j r.We shall use these bounds to control the quantity ∆ which occurs in Janson's inequality.We have for some constant C ′ = C ′ (C, k).We may now apply Janson's inequality to obtain for some c = c(k, C, ε) > 0 that where for the last inequality we used that k r and t N −r/k .For the upper tail, (5), we use Lemma 3.2, which (applied with ε/C) gives us that where ℓ = c ′ t k/r N for some constant c ′ > 0. We now simply apply Markov.We consider two cases based on which term is larger.We note that µ s k e(H) If the maximum is the first term then by Markov we have which is of the form exp(cℓ) = exp(−ct k/r N), as required.
If the maximum is the second term then for some constant c > 0, as required.
We now state the application which will be of interest to us.In fact we require a result which is not only for link hypergraphs but also some hypergraphs derived from them.Given a k-uniform hypergraph H on vertex set [N] and a vertex x ∈ [N] the link hypergraph H(x) is the (k − 1)-uniform hypergraph with vertex set V \ {x} and an edge e \ {x} for each edge e ∈ E(H).In the case of a weighted hypergraph e \ {x} inherits the weight of e.
Let us also consider the operation in which each edge is replaced by all of its j-element subsets.Given a k-uniform hypergraph H and j k we write H j for the (weighted) j-uniform hypergraph in which each edge is replaced by its j-element subsets (with multiplicity).That is, the edges of H j are the j-element subsets which are contained in at least one edge of H, and the weight associated with an edge f is |{e ∈ E(H) : f ⊆ e}|, the number of edges of H which contain it.
We will consider applying Proposition 3.1 to link hypergraphs H(x) and the hypergraphs H(x) j obtained from them.Given an element x ∈ [N] we write B r k and C be integers and let ε > 0, then there is a constant c > 0 such that the following holds.Let H be a k-uniform hypergraph with ∆ r (H) C. Let t N −(r−1)/(k−1) , and let m = tN.There is probability at most 2kN 3 exp(−ct (k−1)/(r−1) N) that the inequality max occurs for some i m and some 1 j k − 1.
Proof.It clearly suffices to prove that for fixed choices of i m, j k − 1 and x ∈ [N] we have that Let us fix i m, j k − 1 and x ∈ [N].Since the event requires x ∈ B i−1 , it suffices to prove where B

(x)
i−1 is a uniformly random set of i − 1 elements of [N] \ {x}.During the proof we will use Proposition 3.1.To avoid confusion with the parameters r, k, C, ε of the corollary we shall write r ′ , k ′ , C ′ and ε ′ for the parameters in the condition of the proposition and c = c(k ′ , C ′ , ε ′ ) for the constant which appears in the result.We consider two cases depending on the value of j.Let us first observe that H(x) inherits certain properties from H. In particular H(x) is (k − 1)-uniform and has ∆ r−1 C. In the case of H(x) j we note that H(x) j is j-uniform and we have the bounds ∆ r−1 C2 k (j r − 1) and ∆ j C2 k N r−j−1 (j r − 2).
Case I: If r − 1 j k − 1 then we apply Proposition 3.1 with parameters r as required.
Case II: If 1 j r − 2 then we let H ′ be the weighted hypergraph obtained from H(x) j by dividing all its edge weights by N r−j−1 .It then follows that H is a j-uniform weighted hypergraph with ∆ j C2 k .We apply Proposition 3.1 with parameter r ′ = j, k ′ = j, C ′ = C2 k and ε ′ = ε to the hypergraph H ′ to obtain In this section we shall see that all the contributions X ℓ (B i ) to the martingale increment are (with very high probability) close to a deterministic multiple of X 1 (B i ).Set The main result of the section is as follows.We denote by Y ℓ (B i−1 , •) the random variable Y ℓ (B i ) conditioned on the first i − 1 elements.So that, for example, which is a B i−1 -measurable quantity.
Proposition 4.1.Let 2 r k and C be integers and let ε > 0, then there is a constant c > 0 such that the following holds.Let H be a k-uniform hypergraph with ∆ r (H) C. Let t N −(r−1)/(k−1) , and let m = tN.Except with probability at most 2kN 3 exp(−ct The alert reader will notice the similarity with our bound, Corollary 3.3, on deviations in link hypergraphs H(x) and the related hypergraphs H(x) j .This is no coincidence.Let E(ε) be the event that max for all i m and 1 j k−1.By Corollary 3.3 we have P (E(ε)) 1−2kN 3 exp(−ct (k−1)/(r−1) N).
And so it suffices to prove that This is the approach we will take to proving Proposition 4.1.
Proof of Proposition 4.1.As discussed above it suffices to prove (7).Let us recall the definition of Y ℓ (B i ) and expand.We obtain where Using the notation T ℓ (B i−1 , •) for the random variable T ℓ (B i ) conditioned on the first i − 1 elements, it clearly suffices to prove that In fact we shall prove that T And so it remains only to prove that A ℓ The conditions (i)-(iii) correspond to (ℓ − 1)-sets S ′ = S \ {x} in edges e \ {x} of the link hypergraph H(x).In this context, condition (iv) asserts that As we discussed in the introduction, a major task on our path to proving Theorem 1.2 involves approximating D H (B m ) by Λ H (B m ), which depends only on the degrees of the vertices selected for the set B m .We may now state explicitly the sense in which Λ H (B m ) approximates D H (B m ).We use the notation ω(1) for a function which tends to ∞ as N → ∞.Proposition 5.1.Let 2 r k and C be integers.Let H N be a sequence of (weighted) kuniform hypergraphs with To lighten the notation we drop N from the notation H N in this section.
Let us now give an idea of the proof of Proposition 5.1.We recall that we have a martingale expression for each of and we may express where κ ′ (i, m) 1−s .We will now express D H (B m ) in a similar form.We recall the random variables defined in the previous section and note that where And so the difference between D H (B m ) and Λ H (B m ) may be expressed as: The following lemma will be useful.
Lemma 5.2.For all i m N/2 we have So that it suffices to prove that This is now straightforward since the terms in the first expression are k and all remaining terms are O(t k−2 N −1 ), as required.
We now prove Proposition 5.1.
Proof of Proposition 5.1.Let S (1) (m) and S (2) (m) denote the two terms on the right hand side of (8).By the triangle inequality, the union bound, and the expression (8) for the difference D H (B m ) − Λ H (B m ) it suffices to prove 2 and all t, α N satisfying the conditions.
We begin with S (1) ).This will allow us to bound the maximum change and the quadratic variation of the martingale S (1) (j) : j = 0, . . ., m defined by S (1) 1 the increments of the martingale have absolute value at most C 1 t k−2 N r−2 almost surely.By the Hoeffding-Azuma inequality (Lemma 2.2) we obtain P S (1)  α where we have used here that t ≫ (log N/N) (r−1)/(k−1) ≫ N −1 .
Let ε > 0. We now consider It will be convenient to work with ) for all i m and ℓ k, except with probability at most O(N 3 ) exp(−Ω(t (k−1)/(r−1) N)).It follows that is equal to S (2) (m) except with probability at most O(N 3 ) exp(−Ω(t (k−1)/(r−1) N)).Since the coefficient of Y * ℓ (B i ) is at most t k−ℓ it is easily checked that the increments of the martingale S (2), * (m) are all at most εkt k−1 N r−1 almost surely.We now apply the Hoeffding-Azuma inequality to obtain Since ε is arbitrary, and the second probability is much smaller (by the lower bound on t and the upper bound on α N ), this completes the proof.

The quadratic variation of the process
Given a hypergraph H we may write X H (B i ) (or simply X(B i )) for the martingale increment Since our eventual aim is to control deviation probabilities using Freedman's inequality, the behaviour of the quadratic variation of the process Λ H (B m ) is of particular importance.We prove the following.
Proposition 6.1.Let 2 r k and C be integers and let ε > 0, then there is a constant c > 0 such that the following holds.Let H be a k-uniform hypergraph with ∆ r (H) C. Let t 1/2, and let m = tN.Then, except with probability at most 4N 2 exp(−ctN), we have The proof of Proposition 6.1 will be relatively straightforward once we have proved the following lemma on the likely behaviour of To streamline notation we drop H from the notation for the remainder of the section, writing simply, V (m), X(B i ), X 1 (B i ), etc. Lemma 6.2.Let 2 r k and C be integers and let ε > 0, then there is a constant c > 0 such that the following holds.Let H be a k-uniform hypergraph with ∆ r (H) C. Let t 1/2, and let m = tN.Except with probability at most 4N 2 exp(−ctN) we have for all i m.
Proof.We begin by expanding X 1 (B i ) according to its definition in terms of A 1 (B i ), which is the degree of the vertex added at step i, We now consider each of these two terms and show that each is very likely to be close to a certain value.
Let H deg be the 1-uniform hypergraph on [N] in which every vertex is an edge with weight given by the degree of that vertex in H.And let H ′ deg be obtained by dividing all these weights by N r−1 .
We will relate the deviation of x d(x), to a deviation for the hypergraph H ′ deg .We first observe that .
Since i m = tN N/2 we have N − i + 1 N/2 and so We now control this deviation using Proposition 3.1.We have that H ′ deg is a 1-uniform hypergraph with ∆ 1 C and so there is a constant c(C, ε ′ ) > 0 such that The same argument may be used to control the deviation of for some constant c(C 2 , ε ′ ) > 0.
Taking ε ′ = ε/4C the result now follows from the triangle inequality.In particular, if neither of the deviations above occur then we have We may now deduce Proposition 6.1.
Proof of Proposition 6.1.Let us begin by considering a sum related to the coefficients t k−1 (1− t)/(1 − s) which occur in the definition of X(B i ).In particular, let us study the sum of the squares of these coefficients.We will use that for We are now ready to deduce the proposition from Lemma 6.2.By the lemma there exists a constant c > 4ε −1 such that, except with probability at most 4N 2 exp(−ctN), the conditional second moments satisfy for all i m.If this is the case then as required.

Proof of Theorem 1.2
As we mentioned in the introduction, the proof of Theorem 1.2 divides naturally into the task of proving that D H (B m ) is generally very well approximated by Λ H (B m ), which we have now achieved in the form of Proposition 5.1, and the task of controlling the probability of deviations for Λ H (B m ).We now state the required result for Λ H (B m ).It will then be relatively straightforward to complete the proof of Theorem 1.2.Let logN/N ≪ m/N = t 1/2.Let a N be a sequence such that Furthermore the same holds for the lower tail probability Proof.We remark that the statement on the lower tail follows from the same proof with the obvious minor adjustments.We omit the subscript N from H N during the proof.
The result consists of an upper bound and a lower bound on P Λ H (B m ) a N .We begin with the upper bound, which will be proved using Freedman's inequality (Lemma 2.3).We consider the martingale given by which has initial value 0 and final value S m = Λ H (B m ).Note that the quadratic variation of the process V (m) is exactly V H (m) studied in Section 6.
Let ε > 0 and let β = t 2k−1 (1 − t)σ 2 (H)N + εt 2k−1 N 2r−1 and R = Ct k−1 N r−1 .By Proposition 6.1 we have that V H (m), the quadratic variation of the process, is at most β, except with probability at most 4N 2 exp(−ctN), for some constant c > 0. We may also observe that CN r−1 deterministically and so the increments are at most Ct k−1 N r−1 deterministically.And so it follows by an application of Freedman's inequality (Lemma 2.3) that adapts an argument of Bahadur [2]). b where x(m) = (m − pN)/ √ pqN, which holds provided N −1 p 1/2, and It is useful to classify the possible choices of m as follows.For η ∈ [0, 1] let us define We may think of ( 9) as offering us various ways to achieve the required deviation.The term m = m 0 corresponds to a case where the number of points in B p is equal to its expected value, pN, and all the work of achieving the deviation must be done in the m-model.On the other hand m = m 1 corresponds to a large enough deviation in the number of points in B p that no (significant) deviation is required in the m-model, as L H N (m 1 ) ≈ (1 + δ N )h.So we will be interested in the choice of η ∈ [0, 1] which minimises the total "cost" of the deviation.In fact this will be achieved by Note that for η ∈ [0, 1] we have by a fairly simple computation Therefore, achieving the deviation since Let us also define η We are now ready to prove Theorem 1.1.
Proof of Theorem 1.1.Note that, under the given conditions, any additive error of order O(log N) in the exponential may be included in the little o term.As this is equivalent to a multiplicative N O (1) term in front of the exponential it suffices to prove a result for the maximum contribution (as there are only N values of m) in (9).In other words, we must prove that where We consider three regimes of m: (i) m m 0 , (ii) m 0 m m • and (iii) m m • .
Regime (i): Since b N,p (m) 1 and as the event where we used (11) in the last line.The conditions on δ N allow us to apply Theorem 1.2 and so we obtain where we have used the estimate (10)

Applications to arithmetic structures
In this section we specialize Theorems 1.2 and 1.1 to arithmetic progressions (Section 9.1) and additive quadruples (Section 9.2) in {1, . . ., N}. 9.1.Arithmetic progressions.We denote by H k the hypergraph encoding increasing k-APs in [N].Since applying Theorems 1.2 and 1.1 requires control of the average degree and of the variance of the degrees, the following lemma will be useful.Its proof is given in [24] (we remark that similar computations were given in [5]).
It is also easy to check that ∆ 2 (H k ) = O(1) and so we can apply Theorems 1.2 and 1.1 with r = 2.In the m-model, we obtain the following result.Theorem 9.2.Let 0 m N be such that t = m/N 1/2.Let a N be a sequence such that t k−1/2 N 3/2 (log N) 1/2 ≪ a N ≪ t 3k/2−1 N 2 .Then , where θ k is defined in (13).
Before stating the result for the p-model, we define Therefore Theorem 1.1 gives us the following.
Theorem 9.3.Given a sequence p = p N which may be a constant in (0, 1/2) or converge to 0, let δ N be a sequence satisfying log N pN ≪ δ N ≪ p k/2−1 . Then This result is applicable when p ≫ (log N/N) 1/(k−1) .We remark that this theorem was obtained by Bhattacharya, Ganguly, Shao and Zhao [5] under the following conditions on p and δ N : p → 0, δ N = O(1), δ −3 N p k−2 (log(1/p)) 2 → ∞, and min{δ N p k , δ 2 N p} N − 1 6(k−1) log N For k = 3, the right-hand side can be relaxed to N −1/6 (log N) 7/6 and for k = 4, it can be relaxed to N −1/12 (log N) 13/12 .Our theorem extends the range of p and δ N for which the result is valid.9.2.Sidon equation.We now turn to solutions of the Sidon equation x + y = z + w in [N].We shall focus on solutions in which x, y, z and w are distinct and state our results in the context of the 4-uniform hypergraph H S with vertex set on [N] and edge {x, y, z, w} if x+y = z + w.We remark that it would be straightforward to extend our results to include solutions with a repeated element -one may simply apply Proposition 3.1 to the corresponding 3uniform hypergraph to bound its contribution.The following lemma provides information on the average degree and the variance degree of H S . .

Figure 1 .
Figure1.The grey lines on the left represent deviations of the order of the standard deviation and so are covered by the central limit theorem[3].The blue areas ought to belong to the "Normal" regime in which we would expect that r(N, p, δ) to be of the form (1 + o(1))3δ2  N pN/56(1 − p).Our results show this in the light blue and turquoise regions, while Bhattacharya, Ganguly, Shao and Zhao[5] covered the turquoise and dark blue regions.The question remains open in the striped regions.In the red striped region, the "Poisson" regime, we would expect r(N, p, δ)= (1 + o(1))δ 2 N E [W 3 ] /2 = (1 + o(1))δ 2 N p 3 N 2 /8.In the green striped region, the "localisition" regime, we would expect r(N, p, δ) = (1 + o(1))2δ

2. 1 .
Martingale representation.In order to state the result we require some notation.Let us define B m = {b 1 , . . ., b m } where b 1 , . . ., b N is a uniformly random permutation of [N].

Proposition 7 . 1 .
Let 2 r k and C be integers.Let H N be a sequence of (weighted) k-uniform hypergraphs with V (H N ) = [N], ∆ r (H N ) C and σ 2 (H N ) N 2r−2 /C for all N.

Lemma 9 . 4 .Theorem 9 . 6 .
Let H S be the hypergraph corresponding to pairwise distinct solutions of the Sidon equationx + y = z + w in [N].Then d(H S ) = (1 + o(1)) N 2 3 and σ 2 (H S ) = (1 + o(1)) N 4 720 .We conclude obtaining the following result in the binomial model, where we use that e(H S ) = (1 + o(1))N 3 /12, by Lemma 9.4 and d = 4e(H S )/N, since the hypergraph is 4-uniform.Given a sequence p = p N which may be a constant in (0, 1/2) or converge to 0, let δ N be a sequence satisfyinglog N pN ≪ δ N ≪ p 1/4.Then P D H S (B p ) δ N p 4 e(H S ) = exp −(5 + o(1))δ 2 N pN 162(1 − p) Theorem 1.1.Let 2 r k and C be integers.Let H N be a sequence of (weighted) kuniform hypergraphs with V (H Simply proceed as above except with the hypergraph H deg,2 with weights given by d(x) 2 and a renormalised version H ′ deg,2 obtained by dividing all weights by N 2r−2 .It follows that ] 2.