Scale-free graphs with many edges

We develop tail estimates for the number of edges in a Chung-Lu random graph with regularly varying weight distribution. Our results show that the most likely way to have an unusually large number of edges is through the presence of one or more hubs, i.e.\ vertices with degree of order $n$.


Introduction and main results
We analyze a sequence of random graphs introduced by [5,13] which is constructed as follows.
Let n be the number of vertices and let X i , i ≥ 1, be an i.i.d.sequence of non-negative random variables with mean µ and a right tail which is regularly varying with index α > 1: for x > 0, with L(yx)/L(x) → 1 for y > 0 as x → ∞.X i can be interpreted as a weight for vertex i, and we denote µ = E [X i ].A vertex with a high weight tends to have more edges: the probability p ij that an edge is present between vertices i and j equals Given i.i.d.uniform [0, 1] random variables U ij , 1 ≤ i < j ≤ n, we define the total number of edges E n in the graph as where denotes the indicator function.The mean of E n grows as µn/2.The specific purpose of this study is to investigate the probability that E n has significantly more edges than usual, i.e.
P(E n > (µ/2 + a)n) for some fixed a > 0. Our broader aim is to contribute to a better understanding of largedeviations properties of random graphs with power-law degrees.In the past decade there has been increased activity in establishing large deviations for random graphs.There now exist various large-deviations results for dense graphs and sparse graphs with light-tailed degrees [6,9,10,14,20], which do not cover scale-free graphs.The typical behavior of scale-free graphs is subject to intense research activity [25,26], while their large-deviations analysis is so far restricted to the Pagerank functional [12,21] or the cluster sizes for critical random graphs [27].
To describe our main results, we introduce additional notation.Denote the mean M n of E n , conditional on the weights X 1 , . . ., X n by and set S n = µnM n , i.e.
We now give a description of our main results.A key parameter is Assuming that a/µ is not an integer, we show that the most likely way for S n to reach a value exceeding (µ 2 /2 + a)n is by k large (of order n) values of X i , an event which has probability of order n k P(X 1 > n) k .In particular, if X 1 , . . ., X k equal a 1 n, . . ., a k n, the remaining X i , i > k have a typical value, and k ≪ n is fixed, then, invoking the weak law of large numbers yields Following the intuition from large deviations for heavy-tailed random variables (see e.g.[23]) we need to choose k as the smallest number such that there exist constants a 1 , . . ., a k to get This leads to the choice k = k(a).A transition in the number of required hubs in a appears when k(a) is integer, which then also changes the scaling of n k P(X 1 > n) k .Precisely at this transition point, it is therefore difficult to obtain precise statements, which is why we will work with the assumption that k(a) is non-integer.A more technical discussion on this topic can be found at the end of Section 3. To state our results formally, we define and we let, for b > 0, X b i , i ≥ 1, be an i.i.d.sequence such that P(X b i > x) = (x/b) −α , x ≥ b.Informally, the distribution of X b i is that of X i conditioned on the event {X i ≥ b}, when the slowlyvarying function L(x) = 1.In our context, it emerges as the limit of P(X i /bn > x | X i > bn) as n → ∞.Set η(a) as the smallest value η for which (k(a) − 1)µ + E [min{ηX 1 , µ}] ≥ a. Observe that η(a) > 0, and also η(a) < ∞ if a/µ is not an integer.Define the constant We first state our main result on S n .With f (n) ∼ g(n) we denote that the ratio of f and g converges to 1 as n → ∞.
Proposition 1.1.Assume that a/µ is not an integer.Then S n only involves randomness from the vertex weights X i , while E n also involves randomness from the uniform random variables in (3).Our main result, derived from Proposition 1.1, shows that the tail of E n behaves the same as the one of M n : Theorem 1.2.Suppose that a is not an integer.Then Therefore, P(E n > (µ/2 + a)n) is regularly varying of index −⌈a⌉(α − 1).In particular, The intuition behind this result is similar to the intuition given for S n , combined with the insight that the additional randomness generated by the uniform random variables U ij is of lesser importance: the event that the number of edges exceeds (µ/2 + a)n is caused by k = ⌈a⌉ hubs, i.e. vertices with nodes of weight of order n.More in particular, our proofs give the insight that the weights of the k hubs, normalized by n, converge weakly to (X To prove Theorem 1.2, we use well-known concentration bounds for non-identically distributed Bernoulli random variables to show that E n and M n are close, facilitated by an estimate for the lower tail of S n .It is difficult to get rid of the integrality condition in Theorem 1.2, as this is where a transition occurs between the number of hubs that are needed.We are able to derive a weaker result, namely a large-deviations principle.Define I(x) = (α − 1)⌈x⌉ if x ≥ 0 and ∞ otherwise.Although I is discontinuous on its effective domain, it is lower semi-continuous, so that I is a rate function.Define Corollary 1.3.E n , n ≥ 1, satisfies a large-deviations principle with speed log n and rate function I, i.e. for any Borel set A, Our results constitute another case where a rare event in the presence of heavy tails is caused by multiple big jumps.Other heavy-tailed systems exhibiting rare events with multiple big jumps are exit problems [4], fluid networks [11,28], multi-server queues [3,15,16], and reinsurance problems [1].For sample-path large deviations of heavy-tailed random walks, see [23].
In all our asymptotic results, the slowly varying function L(x) plays no essential role, our techniques essentially allow us to treat the case of a general slowly varying function without any significant additional effort.The probability of a hub of weight at least εn is dominated by the power-law part of the distribution.However, L(x) is included implicitly in our results, for example in P (X 1 > n), but also in the definition of C in (8) in Proposition 1.1.
The rest of this article is organized as follows.In Section 2 we gather some preliminary results from the literature needed for our proofs.The proof of Proposition 1.1 is developed in Section 3. The proof of Theorem 1.2 is presented in Section 4. The proof of Corollary 13 is given in Section 5, and we end with a short discussion of our results.

Preliminary results
The following lemma is a key estimate for sums of truncated heavy-tailed random variables, which is a reformulation of Lemma 3 in [22].
Lemma 2.1.For every δ > 0 and β < ∞ there exists an ε > 0 such that We proceed by stating a version of Chernoff's bound for sums of independent Bernoulli random variables.The statement is a variation of Theorem A.1.4 in [2].Lemma 2.2.Let B i , i ≥ 1 be a sequence of independent Bernoulli random variables with We finally state an elementary tail bound for binomially distributed random variables.
Lemma 2.3.Suppose B(n, p) has a binomial distribution with parameters n and p.Then Proof.Set B(n, p) = n i=1 B i , note that P (B(n, p) ≥ m) = P ∃i 1 , . . ., i m s.t.B ij = 1 and apply the union bound.

Proof of Proposition 1.1
Throughout this section, we fix a such that a/µ is not an integer and write k(a) = k, η(a) = η.Define for ε > 0: The idea of the proof is to subsequently rule out the events N n,ε < k and N n,ε > k.After that, we condition on N n,ε = k to work out the remaining technical details.This will be the focus of the next three lemmas which together form the proof of Proposition 1.1.
Lemma 3.1.There exists ε > 0 such that Proof.We prove this lemma by suitably upper bounding S n to invoke Lemma 2.1.Let m ≤ k.
Consequently, there exists a ζ > 0 such that for sufficiently large n, We can now bound (21) for n large, by for suitably small ε, where we have applied Lemma 2.1 in the last equality.Invoking (20) and summing the estimates over m = 1, . . ., k gives the desired result.
We are left to consider Proof.Write with A k+1 as in (19).To analyze P(S n > (µ 2 /2 + a)n 2 ; A k+1 ), define the random variable S n (x 1 , . . ., x k ) as S n conditioned on X i = x i n, i = 1, . . ., k, but where X i for i > k are random and distributed as (1).More precisely, Recall that C(a 1 , . . ., a k ) = From the weak law of large numbers, it follows that S n (x 1 , . . ., x k )/n 2 → 0 + C(x 1 , . . ., x k ) + µ 2 /2.Consequently, Next, recall that X ε i , i ≥ 1 are i.i.d.random variables with support on [ε, ∞) such that P( Since P(X i /n ≤ x i | X i > εn) converges to the continuous distribution P(X ε i ≤ x i ), we can ignore the contribution to the integral of all (x 1 , . . ., x k ) such that C(x 1 , . . ., x k ) = a.To see this, note that a → E[min{aX k+1 , µ}] is strictly increasing on [0, 1/u * ), with u * = inf{u > 0 : for each i, then C(x 1 , . . ., x k ) = µk = µ⌈a/µ⌉ > a, since a/µ is not an integer.Consequently, the set of (x 1 , . . ., x k ) for which C(x 1 , . . ., x k ) = a has Lebesgue measure 0 in R k .Thus, we can apply (27) to show that the integral in the last display converges to P(C(X ε 1 , . . ., X ε k ) ≥ a).This probability is strictly positive, since a/µ is non-integer.To eliminate the auxiliary parameter ε, note that C(x 1 , . . ., x k ) < a as soon as there exists some i such that x i < η, as the other terms contribute at most (k − 1)µ to the summation.Therefore, if ε < η, Furthermore, by regular variation, Putting everything together, we conclude that The lemma now follows from ( 25) and the fact that n k ∼ n k .We close this section with some technical comments on the integrality condition on a/µ.From the heuristics given so far, it is clear that it helps to distinguish between scenarios involving k(a) or k(a) + 1 jumps, and it guarantees that the pre-factor P(C(X η 1 , . . ., X η k ) ≥ a) is strictly positive.If a/µ is integer and u * = inf{u > 0 : P (X k+1 > u) = 1} = 0, then P(C(X η 1 , . . ., X η k ) ≥ a) = 0 and the true asymptotics will change.We conjecture that either a/µ or a/µ + 1 hubs are needed.If u * > 0 and a/µ is integer, then P(C(X η 1 , . . ., X η k ) ≥ a) > 0. In that case we expect that the dominant scenario is a/µ hubs, but the above proof method breaks down, and we need to understand (at least) the second-order properties of S n (x 1 , . . ., x k ) as n → ∞.To develop a complete understanding in each of the cases u * > 0 and u * = 0 requires methods which are beyond the scope of this study.For example one may obtain a central limit theorem for the number of edges by extending some of the results in [18] to take into account truncation, as is done for sums of truncated i.i.d.heavy-tailed random variables in [8].Also in the next section, the non-integrality assumption plays an important role.

Proof of Theorem 1.2
The proof of Theorem 1.2 is based on suitably bounding the difference between E n and its conditional mean M n = S n /µn, using the concentration bounds in Lemma 2.2.For this procedure to work, we need an asymptotic estimate for the lower tail of S n .Since X i , i ≥ 1, are non-negative random variables, this estimate is considerably easier to obtain than the upper tail.Lemma 4.1.For each a > 0, there exists a δ > 0 such that The estimate (32) now follows by an application of Chernoff's bound to Proof of Theorem 1.2.Conditional on X 1 , . . ., X n , the variables B ij , i < j, indicating whether there is an edge between node i and j, are independent.Therefore, observing we can apply Lemma 2.2 to obtain that, for b > 0, almost surely, with J(b) = min{I B (b), I B (−b)}.Now, write for fixed ε > 0, Invoking Lemma 4.1, the second term on the RHS of ( 35) is smaller than for some δ > 0 depending on ζ > 0, the latter chosen suitably small.We conclude that (making δ smaller than ζJ(b) if needed) We use this identity to prove asymptotic lower and upper bounds which together complete the proof of Theorem 1.2.Invoking (37) and Proposition 1.1 for M n = S n /(µn), we see that In the last equality we have used that k(µa) = k(µ(a − ε)), which holds because a is non-integer.This property also implies that the last expression converges to 1 if ε ↓ 0, providing the upper bound.The lower bound uses that The second term of the RHS is exponentially small in n, as shown in (36).Consequently, invoking Proposition 1.1, In the last equality we used that k(µa) = k(µ(a + ε)), which holds because a is non-integer, implying that the last expression converges to 1 if ε ↓ 0, providing the lower bound.

Proof of Corollary 1.3
As a first step we show that the left tail of E n is lighter than polynomial.
Lemma 5.1.For each a > 0, there exists a δ > 0 such that Proof.Without loss of generality, we can assume a < µ/2.Note that P(E n ≤ (µ/2 − a)n) can be upper bounded by The second term is exponentially small in n due to Lemma 4.1.To analyze the first term, note that E n is a sum of Bernoulli variables with mean M n .Thus, by conditioning on M n , we can we can apply Lemma 2.2 to obtain ≤ e −nyIB (−(1−(µ/2−a)/y)) ≤ e −n(µ/2−a/2)IB (−a/(µ−a)) .
The second inequality follows by noting that I B is non-negative, strictly convex, and 0 at 0. Therefore, yI B (−(1 − (µ/2 − a)/y)) is increasing on [µ/2 − a/2, ∞), so that we obtain the second inequality by replacing y with µ/2 + a/2.Consequently, We have shown that both terms in (40) are exponentially small in n, completing the proof.
Proof of Corollary 1.3.Consider first A closed.If 0 ∈ A, the upper bound is trivial.If 0 ∈ A we can write A = A − ∪ A + , with a − = sup A − < 0 and a + = inf A + > 0. Since A is closed and 0 ∈ A, both a − and a + are elements of A, and a − < 0 < a + .Next, note that Invoking Lemma 5.1, the first term is exponentially small in n.By Theorem 1.2, the second term is regularly varying with exponent (α − 1)⌈a + ⌉ if a + is not an integer.If a + is an integer, we can make a + a bit smaller, while keeping ⌈a + ⌉ fixed, preserving the upper bound for P( E n ∈ A).This yields, using that log L(n)/ log n → 0, and abbreviating the n-independent constant in Theorem 1.2 with a = a + by K, lim sup Assume now that A is open.If sup A ≤ 0 the result is straightforward, so assume that sup A > 0.
For every ε > 0, we can pick the following subset of A: take a such that a ∈ A; and inf x∈A I(x) ≥ I(a) − ε.Since A is open, we may modify the constant a slightly such that a is non-integer.Next, take a sufficiently small constant b such that the ball around a with radius b is in A, such that a − b/2 and a + b/2 are both non-integer, and ⌈a − b/2⌉ = ⌈a + b/2⌉ = ⌈a⌉.Now, observe that with K(•) as in (9).Using (29) we can write for a fixed δ ∈ (0, η(aµ)), Since ⌈a⌉ is constant in a neighborhood of a, we see that K(aµ) is strictly decreasing in a neighborhood of a. Consequently, K(µ(a − b/2)) − K(µ(a + b/2)) > 0 and we can apply Theorem 1.2 to conclude that P( E n ∈ (a − b/2, a + b/2)) is regularly varying with index I(a).Therefore, since log L(n)/ log n → 0, lim inf Letting ε ↓ 0 completes the proof.

Discussion
In this paper, we have studied the probability of a large number of edges in a heavy-tailed random graph model.We show that the most likely way to obtain at least an more edges than expected is by k(a) hubs with weight of order n.
While this paper focuses on the Chung-Lu version of the inhomogeneous random graph 2, there is a wide class of connection probabilities p ij with similar properties [17].We therefore believe that our results can be extended to other connection probabilities in this class.Some of these connection probabilities construct random graphs that are similar to the erased configuration model, or the uniform random graph, suggesting that large deviations of the number of edges behaves similarly in these models.
In the random geometric graph on the other hand, large deviations of the number of edges are caused by one large clique, due to the geometric nature of the model [10].We here show that power-law random graphs on the other hand, are more likely to contain a large amount of edges due to the presence of hubs.It would therefore be interesting to investigate large deviations of edge counts for models with both geometry and power-law degrees, such as the hyperbolic random graph [19] or geometric inhomogeneous random graphs [7].
While the number of edges is one of the simplest graph statistics, we believe that there is a much wider class of graph statistics where the randomness of the i.i.d.weights does not play a role in the large deviations properties, similarly to Theorem 1.2.Proving this however will be more involved for more complex statistics however, as for properties that depend on more than one edge, dependencies between the presences of these edges arise, due to their random weights.For relatively simple statistics, such as triangle counts, this is possible through an exhaustive enumeration of different cases [24], but a more comprehensive method for a wider class of statistics would be interesting to investigate in further research.