A spectral signature of breaking of ensemble equivalence for constrained random graphs

For random systems subject to a constraint, the microcanonical ensemble requires the constraint to be met by every realisation ("hard constraint"), while the canonical ensemble requires the constraint to be met only on average ("soft constraint"). It is known that for random graphs subject to topological constraints breaking of ensemble equivalence may occur when the size of the graph tends to infinity, signalled by a non-vanishing specific relative entropy of the two ensembles. We investigate to what extent breaking of ensemble equivalence is manifested through the largest eigenvalue of the adjacency matrix of the graph. We consider two examples of constraints in the dense regime: (1) fix the degrees of the vertices (= the degree sequence); (2) fix the sum of the degrees of the vertices (= twice the number of edges). Example (1) imposes an extensive number of local constraints and is known to lead to breaking of ensemble equivalence. Example (2) imposes a single global constraint and is known to lead to ensemble equivalence. Our working hypothesis is that breaking of ensemble equivalence corresponds to a non-vanishing difference of the expected values of the largest eigenvalue under the two ensembles. We verify that, in the limit as the size of the graph tends to infinity, the difference between the expected values of the largest eigenvalue in the two ensembles does not vanish for (1) and vanishes for (2). A key tool in our analysis is a transfer method that uses relative entropy to determine whether probabilistic estimates can be carried over from the canonical ensemble to the microcanonical ensemble, and illustrates how breaking of ensemble equivalence may prevent this from being possible.


Introduction
Background. Spectral properties of random graphs have been studied intensively in past years. A non-exhaustive list of key contributions is [3, 4, 7, 9-13, 16, 25]. Both the adjacency matrix and the Laplacian matrix have been popular. Scaling properties have been derived for the spectral distribution and the largest eigenvalue, with focus on central limit and large deviation behaviour. Most papers deal with random graphs whose edges are drawn independently. Different types of behaviour show up in the dense regime (where the number of edges is of the order of the square of the number of vertices), in the sparse regime (where the number of edges is of the order of the number of vertices), and in between.
In this paper we focus on the largest eigenvalue of the non-normalized and non-centred adjacency matrix for a class of constrained random graphs. The largest eigenvalue is a highly non-linear functional of the entries of the adjacency matrix and therefore carries global information about the structure of the graph. Constraints are natural in the framework of statistical mechanics and Gibbs ensembles. Typically, they introduce a dependence between the edges that makes the spectral analysis challenging.

Breaking of ensemble equivalence (BEE). One of the interesting phenomena exhibited by certain classes of constrained random graphs is Breaking of Ensemble Equivalence (BEE). To under-
stand what this is, we recall that in statistical physics different microscopic descriptions are available for a system that is subjected to a constraint, referred to as Gibbs ensembles. In the microcanonical ensemble the constraint is hard, i.e., each microscopic realisation of the system matches the constraint exactly. In the canonical ensemble the constraint is soft, i.e., is met only on average. For finite systems the two ensembles are clearly different, since they represent different physical situations (energetic isolation, respectively, thermal equilibrium with a reservoir at an appropriate temperture). However, the general belief is that this discrepancy vanishes in the thermodynamic limit. This expectation, referred to as Equivalence of Ensembles (EE), permeates the theory of Gibbs ensembles. It turns out that for many physical systems EE holds, but not for all. We refer to [23] for more background.
For interacting particle systems, EE has been studied at three different levels: thermodynamic, macrostate and measure. It was shown in [23] that these levels are equivalent. The present paper uses the measure level, which is based on the vanishing of the specific relative entropy. In [8,14,20,21], the phenomenon of BEE was studied for random graphs subject to different types of constraints. It was found that, interestingly, BEE is the rule rather than the exception for constraints that are either extensive in the number of vertices or frustrated. An overview can be found in [18].
Spectral signature of BEE. Let A be the adjacency matrix of a random graph on n vertices, i.e., A = {a ij } i,j∈ [n] with a ij = 1 {i∼j} . Let λ 1 (n) denote its largest eigenvalue. For i ∈ [n], let k i be the degree of vertex i. Write E can and E mic to denote expectation with respect to the canonical, respectively, microcanonical ensemble. Put (1.1) Our working hypothesis is that The goal of the present paper and future work is to verify when this working hypothesis is valid and to identify what are the exceptional constraints (see Remark 1.4 below). We will verify the working hypothesis for two specific examples of constraints: (1) fix the degrees of the vertices (= the degree sequence); (2) fix the sum of the degrees of the vertices (= twice the number of edges). Example (1) corresponds to the so-called configuration model. We consider the particular case where all the degrees are fixed at a common value d(n), in which case the microcanonical ensemble becomes the d(n)-regular random graph, for which λ 1 (n) = d(n) with probability 1. For this case, BEE is known to occur for all choices of d(n) = {0, n − 1}, and we will see that ∆ ∞ = 0 except in the ultra-dense regime where lim n→∞ n −1 d(n) = 1. The failure of our working hypothesis in this regime is a consequence of the saturation of the adjacency matrix. Indeed, the largest eigenvalue becomes ineffective in detecting BEE when the two ensembles asymptotically concentrate around the complete graph, for which the largest eigenvalue achieves the maximal value n − 1. In contrast, relative entropy manages to detect BEE because the two ensembles still look different in the ultra-dense regime, where the number of achievable graphs scales as the exponential of n 2 . For Example (2) we will see that no BEE occurs and that ∆ ∞ = 0. For both examples the canonical ensemble coincides with the Erdős-Rényi random graph with an appropriate retention probability [20].
To state (1.4), we removed diagonal entries so as to get simple graphs, as explained in Chapter 5.1. Theorem 1.1 shows that the largest eigenvalue of the Erdős-Rényi random graph is a perturbative correction around the mean degree d(n) = (n − 1)p(n). In the dense regime p(n) ≡ p ∈ (0, 1) we get the classical result from [13]. In the ultra-dense regime, where the complementary graph is sparse, we can still use [12, Definition 2.1]. The lower bound on p(n) in (1.3) implies that we do not capture the sparse regime below the connectivity threshold: a crossover in the scaling behaviour of λ 1 (n) occurs when d(n) ≍ log n, as proved in [3]. Theorem 1.1 leads us to our main result. (1.5) The restriction that nd(n) is even is needed to make the constraint graphical, i.e., there exist simple graphs that meet the constraint. Note the remarkable fact that both E mic [λ 1 (n)] and E can [λ 1 (n)] tend to infinity as n → ∞ while their difference remains bounded. As shown in [14,20], BEE occurs in example (1) and EE in example (2), and hence Theorem 1.2 supports our working hypothesis that BEE corresponds to a non-vanishing difference of the expected largest eigenvalues under the two ensembles. Remark 1.3. In [16] a general technique is used that also covers the regime 0 < p(n) < n −1 (log n) β . However, as stated by the authors in their conclusions, their method does not allow for a derivation of the asymptotics of E[λ 1 (n)]. Nevertheless, it is worth mentioning that when p(n) = c n , c ∈ (0, ∞), the asymptotic behaviour of λ 1 (n) in the Erdős-Rényi model G(n, p(n)) is with high probability. Interestingly, in view of the results in Section 4, this suggests that (1.5) may have limit ∞ in this regime.

Remark 1.4.
In [14] it is shown that BEE occurs for three regimes of constant degree d(n): √ n ) (ultra dense regime). The scaling of the specific relative entropy is n for regimes (I) and (II), and n log n for regime (II). Theorem 1.2 (1) shows that our working hypothesis holds in regime (I) (subject to d(n) ≥ (log n) β ) and (II), but fails in regime (III). The reason is that, while the specific relative entropy is invariant under the map where edges are replaced by non-edges and vice versa, the same is not true for the largest eigenvalue. In the ultra dense regime, other spectral quantities may be better candidates to look at than the maximal eigenvalue. This is no surprise: in [23] it was shown that the relative entropy is the most sensitive global quantity to detect BEE, while other global quantities may detect BEE in certain settings and fail to do so in others. For instance, if the constraint is that the maximal eigenvalue takes a prescribed value, then clearly ∆ ∞ = 0 while BEE may still be possible.
Outline. The remainder of this paper is organised as follows. In Section 2 we recall the definition of the microcanonical and the canonical ensemble in the setting of constrained random graphs. Section 3 describes our main tool: a transfer method based on relative entropy, which carries over estimates on rare events from the canonical ensemble to the microcanonical ensemble, and describe its role in the general framework of BEE. In Section 4 we prove Theorem 1.2(1), in Section 5 we prove Theorem 1.2(2).

Gibbs ensembles for constrained random graphs
Consider the discrete probability space (G n , B, P), with G n the set of all simple graphs on n vertices, B = 2 Gn the power set of G n consisting of all the subsets of G n , and P a probability measure.
A constraint is defined to be a vector-valued function C : G n → R d . Fix a value C ⋆ that is graphical, i.e., C(g) = C ⋆ for at least one g ∈ G n . Define The microcanonical ensemble is the uniform probability distribution on Γ C ⋆ : The canonical ensemble is defined via the Hamiltonian H(g, θ) = θ, C(g) (where ·, · denotes the scalar product), namely, , called the partition function. Note that both P mic and P can depend on n, but we suppress this dependence. The parameter θ is set to the particular value θ ⋆ that realises the constraint: The constraint C ⋆ , apart from being graphical, must also be irreducible, i.e., no subset of the constraint is redundant [21]. Once these conditions are met, the value of θ ⋆ that solves (2.4) is unique, and so the canonical ensemble is well defined (see the appendices in [21] for further details). The relative entropy of P mic w.r.t. P can is defined as where we use the convention 0 log 0 = 0 and g ⋆ is any graph in Γ C ⋆ . EE in the measure sense is defined as the vanishing of the relative entropy density, i.e., lim n→∞ n −1 S n (P mic P can ) = 0 (see [23]).

Transfer method
Comparison of the two ensembles. The additional freedom in the canonical ensemble implies that there is less dependence between the constituent random variables. In our case these random variables are the edges of the graph. For example, if the constraint is on the degree sequence, then the microcanonical ensemble corresponds to the hard configuration model (which in the case of constant degrees becomes the regular random graph), while the canonical ensemble corresponds to the soft configuration model (which is a special case of the generalized random graph model). The former requires an algorithm that randomly pairs half-edges and creates dependencies, while the latter is constructed via a sequence of independent random trials (which results in a multivariate Poisson-Binomial distribution for the degrees of the vertices [14]). Consequently, in the canonical ensemble calculations are carried out more easily. For example, a lot is known about spectral properties of adjacency matrices of random graphs under the canonical ensemble: because the entries of the adjacency matrix are independent, powerful tools from random matrix theory can be used. The challenge is to transfer properties from the canonical ensemble to the microcanonical ensemble without performing elaborate combinatorial computations.
Transfer principle. We start by noting that The latter holds because g → H(g, θ ⋆ ) and g → P can (g) are constant on the support of P mic , i.e., all microcanonical realisations have the same probability under the canonical ensemble. In particular, where again B = 2 Gn . Consequently, we have the following transfer principle.
Distinguishing sets. Let E P ∈ B be the subset of G n given by Write [E P ] c to denote the complementary event. The crucial step in the argument underlying the transfer method is to find the right event [E P ] c that asymptotically implies failure of the property P that we want to transfer from the canonical ensemble to the microcanonical ensemble. For the remainder, two events are important: These represent the sets that are in the support of P mic for which property P holds and fails, respectively. Our focus will be on replacing , if we are able to prove that lim n→∞ P mic ([E P ] c ) = 0, then we also have lim n→∞ P mic ([E P ] c ∩ Γ C ⋆ ) = 0, and we say that the property defining the set E P holds with high probability as n → ∞. As explained in Section 2, and so if we manage to prove that P can ([E P ] c ) = o(P can (Γ C ⋆ )), then we obtain lim n→∞ P mic ([E P ] c ) = 0.

Role of relative entropy and BEE. Equation (3.4) sets the scale at which the transfer method is
effective. This scale is given by the denominator This leads to an interesting connection between BEE and the transferability of a property P: if P can ([E P ] c ) = o(e −Sn(P mic Pcan) ), then lim n→∞ P mic ([E P ] c ) = 0. Since EE coincides with S n (P mic P can ) = o(n), when the ensembles are equivalent it is easier to transfer. Our proof of Theorem 1.2(2) makes use of precisely this fact, and P is a certain concentration inequality for the largest eigenvalue of the adjacency matrix. By contrast, BEE makes the transfer more difficult. Indeed, Theorem 1.2(1) can be seen as an example where the same concentration inequality P cannot be transferred because the relative entropy is of higher order, namely, S n (P mic P can ) = Θ(n log n) [14,20].
Largest eigenvalue. We know from the results in [23] that ,whenever BEE occurs, there must exist quantities whose macrostate expectation is different under the two ensembles. Clearly, not all macroscopic quantities are good candidates for this. For instance, any linear combination of the constraints necessarily has the same expected value under the two ensembles. What we propose as a candidate is the largest eigenvalue of the adjacency matrix of the graph, because this is a highly nonlinear function of the imposed constraints and is sensitive to the global structure of the graph. In Sections 4-5 we will consider two examples of constraints in the dense regime: (1) fix the degrees of all the vertices; (2) fix the total number of edges. For the former we focus on the special case where all the degrees are equal.
is a convex function of the entries of the matrix A, which means that for both ensembles Taking into account the results of Theorem 1.1 and Section 4, we get If, on top of the constraint on the degree sequence, we add more (compatible) constraints, then by exchangeability we still have . This shows that λ 1 is particularly sensitive to the moments of the underlying degree sequence (as can also be seen from the power method used in [12,13]; see (5.4) and (5.35) below). We may therefore expect that our working hypothesis holds in all those cases where BEE forces the degree sequence to assume either a different mean of a different variance in the two ensembles, as in the case under study.

Proof of Theorem 1.2(1): constraint on the degree sequence
In what follows we suppress the n-dependence from p(n), d(n), λ 1 (n), writing p, d, λ 1 . The d-regular random graph with n vertices, written G n,d , coincides with the microcanonical ensemble with constraint C ⋆ = (d, . . . , d) on the degree sequence, where we allow d = d(n). The largest eigenvalue of the adjacency matrix of G n,d equals d, irrespective of n. The Erdős-Rényi random graph with retention probability p = d/(n − 1) coincides with the canonical ensemble with the same constraint. In order to understand the difference in behaviour of λ 1 under the two ensembles, we need Theorem 1.1. Indeed, the result in (1.4), which actually holds for a generic symmetric random matrix subject to specific regularity conditions, can be interpreted as follows. The adjacency matrix A associated with G(n, p) consists of elements {a ij } i,j∈[n] that are identically 0 when i = j and Bernoulli random trials (a ij = 0, 1) with success probability p when i = j. The largest eigenvalue of the deterministic matrixĀ whose entries areā ij = E can [a ij ] = p when i = j andā ij = 0 when i = j is given by λ 1 (Ā) = (n − 1)p. Hence, compared to λ 1 (Ā), λ 1 is shifted by a random variable whose expected value is (1 − p) and is distributed as N (1 − p, 2p(1 − p)) under certain conditions on d (see [12, equation 6.10]) plus an error term of order dependent on the considered regime ( O(1/ √ n) for p constant). It is important to note that the parameters of this shift depend on p only.
In [12,13] it is shown that (1.4) relies on the fact that in the canonical ensemble the eigenvector v 1 corresponding to the largest eigenvalue λ 1 is very close to the vector ½ = (1, . . . , 1) (i.e., the norm of the projection of v 1 onto ½ is much larger than the norm of the projection of v 1 onto the perpendicular space ½ ⊥ ).

Proof of Theorem 1.2(2): constraint on the total number of edges
Consider the case where the constraint is on the total number of edges: C(g) = C ⋆ = n 2 p for some p ∈ (0, 1). Then the canonical ensemble is still the Erdős-Rényi random graph with parameter p. It was proved in [20] that the two ensembles are asymptotically equivalent on scale n. In particular, it was shown that S n (P mic P can ) = log n + Θ(1). The canonical probability of drawing a microcanonical realization is given by (3.5): Together with (3.4), this tells us that if we can find an event [E P ] c such that P can ([E P ] c ) = o(n −1 ), then we know that lim n→∞ P mic (E P ) = 1. Our goal is to use the results in [12] to apply (3.4) with (5.1). In Section 5.1 we show how our results follow from [12, Theorem 6.2] both in the dense and the non-dense regime. In Section 5.2 and 5.3 we focus on the dense regime and show how our results follow by making the concentration inequalities used in [13] tighter. In particular, we will find that the approach heavily depends on the ability of identifying good concentration inequalities for the degree sequence, which is a special case of the bounds presented in [12]. The heavy dependence on the degree sequence is further evidence of what was said in Remark 3.2. In Section 5.2 we prove a concentration inequality for the degrees under the canonical ensemble (Lemma 5.1) that is of independent interest. In Section 5.3 we use this to prove a concentration inequality for a functional of the degrees that approximates the largest eigenvalue well in the dense regime (Lemma 5.3). In Section 5.4 we transfer the results from the previous sections to the microcanonical ensemble (Lemma 5.4), and show that this leads to a negligible shift of the expected largest eigenvalue.

Proof of Theorem 1.2(2) via [12]
In [12, Chapter 6] the largest eigenvalue of matrices of the form (1, 1, . . . , 1) T and 1+ε 0 ≤ f ≤ N C , with ε 0 , C ∈ (0, ∞) constants. In order to consider only adjacency matrices A s of simple graphs, we have to get rid of the diagonal of A. This can be easily done by considering A s = A−pI = A 0 +f | e e |−pI, where p = p(n) is the retention probability that appears Theorem 1.1 subject to (1.3) and f = np. We note that if λ 1 is the largest eigenvalue of A, then λ 1 − p is the largest eigenvalue of A s , so it suffices to study the largest eigenvalue of A.
Note that e −ν(log n) ξ = o(n −1 ) whenever ν > 0 and ξ > 1. Thus, if an event E P of the type described in (3.3) holds with (ξ, ν)-high probability under P can , then by (3.4) and (5.1) E P it holds also under P mic . Starting from the equation where v is the eigenvector associated with λ 1 and I is the identity matrix, after multiplying by (I −Ã 0 λ 1 ) −1 and projecting on e we obtain the following series for λ 1 : We see that for the series to converge we need Ã 0 /λ 1 < 1. From [12,Lemma 4.3] (see also [2,19,22,24]) and the leading order of (5.4) (see also [12, Eq.(6.5)]) we have that Ã 0 /λ 1 < 1 with (ξ, ν)-high probability (which also holds for the microcanonical ensemble). Iterating (5.4), we get that with (ξ, ν)-high probability where q is the parameter defined in Theorem 1.

Concentration for the degrees under the dense canonical ensemble
For the remainder of the paper we take p ∈ (0, 1) constant and A to be the unnormalized adjacency matrix. For i = j, E can [a ij ] = p and Var can [a ij ] = p (1 − p). In what follows we abbreviate µ = p and σ 2 = p(1 − p). We write ½ = v 1 + r with r ∈ ½ ⊥ , v 1 , r = 0 and A v 1 = λ 1 v 1 . Following the power method in [17], we define which is the vector of row sums of the matrix A, i.e., the vector of degrees of the vertices (the degree sequence). Centering K by Θ ½ with Θ = E[K i ] = (n − 1)p and using ½ = v 1 + r, we get Our key step is the following lemma.
Lemma 5.1. With σ 2 denoting p(1 − p), there exist two constants c 1 , c 2 ∈ (0, ∞) such that are the centred entries of the adjacency matrix. Note that Straightforward counting shows that the sum in (5.9) contains O(n 3 ) different terms. Let us represent b ij = b ji by a variable X α , α ∈ n 2 . Then (5.9) can be rewritten in the form Because there is a one-to-one correspondence between the terms in (5.12) and (5.9), we can conclude that H has O(n 3 ) entries, whose values are either 1 (off-diagonal) or 2 (diagonal). We can apply to (5.12) the Hanson-Wright inequality (see [15] or [1, Theorem 1.4, item 6]). Theorem 5.2. Let X = (X 1 , . . . X N ) be mean-zero square-integrable random variables taking values in R, and let ξ > 0 be such that t > 0, (5.14) where C is a suitable constant, H 2 HS = α,β∈[N ] h 2 αβ is the Hilbert-Schmidt norm of H, and h αβ x α y β : is the ℓ N 2 → ℓ N 2 norm of H. In our setting, N = n 2 . Since |X α | < 1, we have X ψ 2 ≤ 1/ log 2, so that (5.13) applies with ξ = 1/ log 2. Since H has bounded entries, we have H 2 HS = O(n 3 ). Moreover, by the Cauchy-Schwarz inequality we have and so the exponent in the right-hand side of (5.14) is bounded below by min t 2 ξ 4 n 3 , where c 3 is a suitable constant. Taking c 1 ≤ c 3 /C, with C the constant appearing in (5.14), we obtain (5.8).
We end this section with an immediate consequence of Lemma 5.1. Picking t = σ 2 n 2 and using that, for appropriately chosen constants C 1 , C 2 , C 3 , C 4 , we find that there are constantsc ≤ C 4 /C andC such that

Concentration for the largest eigenvalue under the dense canonical ensemble
After applying A once to ½, we must find a suitable normalization in order to isolate λ 1 . This is given by In [13], it was shown that n i=1 K 2 i / n i=1 K i approximates λ 1 with high probability, in the sense that for any x > 0, To analyse the first ratio in (5.25), note that .

Transfer to the dense microcanonical ensemble
Next we use the transfer method to pass the property characterised by the event in (5.23) to the microcanonical ensemble. Indeed, using the notation of Section 3, we identify as the event E c P , i.e., the set of graphs that do not possess the property that we would like to pass on. The fact that P can (E c P ) tends to zero faster than P can (Γ C ⋆ ) (as n → ∞, that is) tells us that also P mic (E c P ) tends to zero, and implies that Thus, in the microcanonical ensemble n i=1 K 2 i / n i=1 K i concentrates around the sum n −1 n i=1 K i + σ 2 /µ with an error of order 1/ √ n. However, we need to also see what n −1 n i=1 K i + σ 2 /µ is in the microcanonical ensemble. The term σ 2 /µ, a constant equal to 1 − p, is in accordance with the constraint in the microcanonical ensemble. For the other term we have n −1 n i=1 K i = (n − 1)p. The two together give precisely the expected value in the canonical ensemble, as follows from Proposition 1.1. Hence we only need to show that n i=1 K 2 i / n i=1 K i concentrates around λ 1 also in the microcanonical ensemble, for which we can once more use the transfer method.
Proof. We need to show that the last term in (5.21), is small. First we show that r is bounded in probability. Indeed, where ζ 1 is a suitable constant. Since max i>1 |λ i | ≥ λ 2 ≥ 0, we can bound λ 2 ≤ βσ √ n with high probability. Using (5.20), we have Now, using the trivial deterministic bound λ 1 ≤ max i j |a ij | < n and Hoeffding's inequality on n i=1 K i = 2 j>i a ij , we can conclude that, for any η > 0, where ζ and Λ are suitable constants. Thus, recalling (5.21), we have settled (5.35).
We thus find that the probability in the canonical ensemble of the event in (5.35) is o (1/n), which confirms the results of Section 5.1. In particular, we have shown that the central object is the ratio n i=1 K 2 i / n i=1 K i . Remark 5.5. The constants in the right-hand side of (5.23) can be chosen freely. By Lemma 5.4, this means that for any choice of constraint for which S n (P mic P can ) = O(log n) and the canonical ensemble is the Erdős-Rényi random graph, we have that λ 1 is close to n −1 n i=1 K i + σ 2 µ in both ensembles. If the constraint does not prevent E mic [n −1 n i=1 K i + σ 2 µ ] to take the value (n − 1)p + (1 − p), then we have the same result as in Theorem 1.2(2), which supports the working hypothesis put forward in Section 1. Indeed, as shown in Section 3, S n (P mic P can ) = o(n) is the condition for EE. Instead, in the sparse regime we have to rely on events that hold with (ξ, ν)-high probability, where ξ is in principle allowed to vary with n, and the condition has to be checked for the specific value of p(n).