Stochastic Block Model in a new critical regime and the Interacting Multiplicative Coalescent

This work exhibits a novel phase transition for the classical stochastic block model (SBM). In addition we study the SBM in the corresponding near-critical regime, and find the scaling limit for the component sizes. The two-parameter stochastic process arising in the scaling limit, an analogue of the standard Aldous' multiplicative coalescent, is interesting in its own right. We name it the (standard) Interacting Multiplicative Coalescent. To the best of our knowledge, this object has not yet appeared in the literature.


Introduction
The multiplicative coalescent is a process constructed in [1]. The Aldous' standard multiplicative coalescent is the scaling limit of near-critical Erdős-Rényi graphs. The entrance boundary for the multiplicative coalescent was exhibited in [2,13].
Informally, the multiplicative coalescent takes values in the space of collections of blocks with mass (a number in (0, ∞)) and evolves according to the following dynamics: each pair of blocks of mass x and y merges at rate xy into a single block of mass x + y. (1.1) We will soon recall its connection to Erdős-Rényi [10] random graph, viewed in continuous time.
New critical regime of the SBM coordinates of x become listed in ord(x)) unless for any finite M there are at most finitely many coordinates of x which are larger than M . We implicitly assume this property here, as it will be true (almost surely) in the sequel.
The multiplicative coalescent is a process taking values in l 2 . If it starts from an initial value x in l 1 , it will reach the constant state ( i x i , 0, . . .) in finite time. This is not considered an interesting behavior. If it starts from an initial value x ∈ l 2 , it will immediately collapse to (∞, 0, . . .), which is even less interesting. Finally, if it starts from x ∈ l 2 \ l 1 , it will stay forever after in this state space. It is a Markov process with a generator that we prefer to describe in words: if the current state is x then for any two different i, j the jump to ord((x i + x j , x −i,−j )) happens at rate x i x j . Here x −i,−j is the vector obtained from x by deleting the ith and the jth coordinate.
The multiplicative coalescent has interesting entrance laws which live at all times (or rather they are parametrized by R). The first and the most well-known such law is called the Aldous standard (eternal) multiplicative coalescent . Following the tradition set in [1], we denote it here by (X * (t), t ∈ R). The law of X * is closely linked (see [1,3,8]) to that of the following family of diffusions with non-constant drift There are uncountably many different non-standard extreme eternal multiplicative coalescent entrance laws and they have been classified in [2], and linked further in [15,14] to an analogous family of Lévy-type processes. We discuss and use these links in detail in [11]. The rest of the paper is organized as follows. In Section 1.1 we introduce the notion of restricted multiplicative merging RMM, and we also state its basic properties. Section 2 is devoted to the continuity of RMM which is used for the proof of the scaling limit of stochastic block model in Section 3. A brief discussion of phase transition for the stochastic block model and of the Markov property of the stochastic block model and the interacting multiplicative coalescent is done in Section 4.

Restricted multiplicative merging and first consequences
A key initial observation in [1] is that Erdős-Rényi graph, viewed in continuous time, is a (finite-state space) multiplicative coalescent. We extend this construction to fit our purposes in Section 3.
For any triangular table (or matrix) a = {a i,j : i, j ∈ N, i < j} of non-negative real numbers, any symmetric relation R on N and any t ≥ 0 we define the restricted multiplicative merging (with respect to (a, R)) RMM t (·; a, R) : l 2 → l as follows: to an element x ∈ l 2 associate a labelled graph, denoted by G t (x; a, R) • with the vertex set N and the edge set {{i, j} ∈ R : a i,j ≤ x i x j t}, where it is natural to extend the definition a i,j := a j,i whenever j < i; • for each i assign label x i to vertex i, this label we also call the mass of i; • each connected component (in the usual graph theoretic sense) of G t (x; a, R) is then endowed with its total mass -the sum of labels or masses x · over all of its vertices.
Then RMM t (x; a, R) is defined as the vector of the ordered masses of components of G t (x; a, R). Note that RMM t (·; a, R) is a deterministic map, and it is clearly measurable since, for each n, RMM t (·; a, R) : R n → l (defined in the same way as above except with finitely many initial blocks) is continuous except on the set {y : ∃i, j ∈ [n] s.t. a i,j = y i y j t}, a union of finitely many demi-hyperbolas (in particular a set of measure 0), and moreover if π n (x) := (x 1 , x 2 , . . . , x n ) then RMM t (π n (·); a, R) converges in l 2 to RMM t (·; a, R) pointwise.
Furthermore if A = (A i,j ) i,j is a family of i.i.d. exponential (rate 1) random variables, and the relation R * is maximal (meaning {i, j} ∈ R * for all i = j) then RMM t (x; A, R * ) has the law of the multiplicative coalescent (from the introduction) started from ord(x) and evaluated at time t. In particular P RMM t (x; A, R * ) ∈ l 2 = 1. Even more is true: {RMM t (x; A, R * ), t ≥ 0} is equivalent to the Aldous graphical construction of the multiplicative coalescent process started from ord(x) at time 0.
It now seems natural to extend the above graphical construction with any infinite relation R as the third parameter. For our study a family of conveniently chosen relations will be particularly interesting, as explained in Section 3.
For two elements x = (x k ) k≥1 , y = (y k ) k≥1 ∈ l 2 we will write x ≤ y if x k ≤ y k for every k ≥ 1. The following lemma is a trivial consequence of the definitions (and the inequality x 2 + y 2 ≤ (x + y) 2 , x, y ≥ 0). Lemma 1.1. For any two symmetric relations R 1 ⊆ R 2 , x ≤ y and any two times where we extend the definition of · to infinity on l \ l 2 .
From now on we use notation to indicate a family of i.i.d. exponential (rate 1) random variables.
, and t > 0, it is not hard to check that almost surely RMM t (x; A, R * ) = (∞, 0, 0, . . .), however the above lemma makes sense also for stronger restrictions R 1 and R 2 where the l 2 norm of one or both of the restricted multiplicative mergers is finite.
We can also observe the following.
1) be defined on a probability space (Ω, F, P). Then for every x ∈ l 2 we have that P RMM t (x; A, R) ∈ l 2 , ∀R symmetric relation and ∀t ≥ 0 = 1.
Proof. Apply Lemma 1.1 together with the above made observations about the multiplicative coalescent graphical construction. Remark 1.4. Note that one cannot strengthen the statement of Lemma 1.3 so that the almost sure event stays universal over x ∈ l 2 even for a single relation R, if R relates one i ∈ N (for example i = 1) to infinitely many other numbers i l , l ∈ N. Indeed, we could define a random X ∈ l 2 as follows: Note that due to the elementary properties of i.i.d, exponentials, with probability 1 for each m ≥ 1, 1/m appears exactly once in the above sequence almost surely, and there are infinitely many zeroes but this we can ignore. One can now conclude easily from the definition of RMM that RMM 1 (X; A, R) contains a component of infinite mass, almost surely.
Mimicking the notation of [13], Section 2.1 let us denote, for any (think large) m ∈ N, is not the restriction of R to {m + 1, m + 2, . . .}, but rather a "shift of R". Lemma 1.5. For every m ∈ N, x ∈ l 2 , a and R as specified above where we again allow value infinity on both sides of the inequality.
Proof. Let us first look at how G t x [m↑] ; a [m↑] , R [m↑] can be constructed from G t (x; a, R). The vertices with masses x 1 , . . . , x m are removed, as well as any of the edges in G t (x; a, R) connecting any of these vertices to any other vertex. The other vertices and edges in , it is precisely the shift of R that ensures this (deterministic) coupling of the two graphs.
The number of connected components of G t (x; a, R) changed in the just described procedure is k m ≤ m. Let us assume that these components have indices i 1 , . . . , i km . On the other hand G t (x; a, R) [m↑] is obtained from G t (x; a, R) via removal of all the (vertices and edges) in the first (and largest) m components of G t (x; a, R). The claim now follows since for all j ≤ k m the j-th largest component of G t (x; a, R) has mass larger or equal to that of its i j -th largest component.

Basic notation and formulation of the statement
In this section, we will state a type of continuity of the map RMM in the first variable. This result will be used later for the proof of the SBM scaling limit.
Then Z (n) (t n ) → Z(t) in l 2 in probability as n → ∞.

Auxiliary statements and proof of Proposition 2.1
In this section, a family of i.i.d. exponential (rate 1) random variables A = (A i,j ) i,j and a symmetric relation R of N will be fixed.
, Z (n) , x, Z be defined as in Proposition 2.1. Assume that there exists m ∈ N such that x (n) k = 0 for all k > m and n ≥ 1. Then Z (n) (t n ) → Z(t) in l 2 a.s. as n → ∞.
Proof. The proof trivially follows from the fact that P {A i,j = x i x j t} = 0 for all i, j ∈ N. On the complement of ∪ i,j {A i,j = x i x j t}, the graphical construction of Z (n) (t n ) clearly converges to that of Z(t).
For two natural numbers i, j we will write i ∼ j if the vertices i and j belong to the same connected component of the graph G t (x; A, R). Note that i ∼ j iff there exists a finite path of edges Recall that · denotes the usual l 2 norm.
Proof. The estimate can be obtained (using the inequality 1 − e −x ≤ x, x ≥ 0) as follows can be uniformly estimated.
, and assume that x (n) → x in l 2 . Then for every T > 0 and ε > 0 there exists m ∈ N such that for every n ≥ 1 for all n ≥ 1. We fix n and write i ∼ j if vertices i and j belong to the same connected New critical regime of the SBM Proof. The statement follows directly from Lemma 2.4 and Lemma 1.
For m ∈ N we introduce the notation We next prove an analog of Lemma 2.3 in a special case where R = R [↓m↑] . From now on we will write i ∼ m j for two natural numbers i, j if the vertices i and j belong to the same connected component of the graph G t (x; A, R [↓m↑] ). Note that t is omitted from the notation. For i, j ≤ m, i ∼ m j if and only if there exists a finite path of edges in otherwise.
Proof. We only write the details of the proof for the case i, j ≤ m. Other cases can be shown analogously. As in the proof of Lemma 2.3 we can estimate where in the second line of the above expressions we used the fact that, for each k, i 0 = i and i k = j.
Remark 2.8. Clearly P t (s) is a polynomial of two parameters. The degree in t is not important for our purposes. However, the fact that the degree in s equals two is reflected in the proof of Proposition 2.1 given below. In particular, the fourth moment estimate of Lemma 2.10 is necessary for our argument.
Proof. For simplicity of notation we set a := x [↓m] 2 , b := x [m↑] 2 and c := 1/(1 − t 2 ab). We first assume that t 2 ab ≤ 1 2 , so that c ≤ 2. Due to Chebyshev's inequality and Lemma 2.6, For x ∈ l 2 we set len(x) to be the index of the last non-zero coordinate in x in case it exists, and otherwise len(x) = +∞. Recall (2.1) and note that Proof. The claim says that, conditionally on Z ≤m (t), Z >m (t), the infinite random vector Z m (t) can be constructed via procedure (2.2). Its proof follows from properties of the exponential distribution and the definition of RMM. We provide three figures to help any interested reader construct a detailed argument. In the figures there are nine blocks in total and m equals five. The general setting (with infinitely many blocks, and arbitrary finite m) is analogous.   Proof. The statement directly follows from Lemma 1.1 and Theorem A.4 in the appendix.
We now use Lemmas 2.7 and 2.9 to obtain the following important uniform bound.
Lemma 2.11. Let Z(t) = RMM t (x; A, R) and Z ≤m (t), Z >m (t) be as in Lemma 2.9. Then for every ε > 0 and m ≥ 1 where the polynomial P t is defined in Lemma 2.7.
Proof. LetZ m (t) = RMM t x; A,R m andỸ m (t) be defined by ( The final identity follows from the independence of Z ≤m and Z >m .  . Then y ∈ l 2 and x (n l ) k ≤ y k for every k ≥ 1. Indeed, the fact that y belongs to l 2 follows from the estimate Hence, by Lemmas 1.1 and 2.10, we obtain that and P Z (n l ) (t n l ) (2.6) New critical regime of the SBM Next, by Lemma 2.2, there exists L ∈ N such that for all l ≥ L We can conclude that for all l ≥ L where we applied (2.5,2.6) and [1], Lemma 17 in order to bound from above the righthand-side in the first and the third line.
This implies that Z(t) − Z (n l ) (t n l ) → 0 in probability as l → ∞. So, we have shown that for any subsequence {n i } i≥1 there exists a subsubsequence {n i l } l≥1 such that Z(t) − Z (n l ) (t n l ) → 0 in probability as l → ∞. Therefore Z(t) − Z (n) (t n ) → 0 in probability as n → ∞.

Scaling limit of near-critical stochastic block models
Let n, m ≥ 2 be given. Let G m n;p,q be the random graph issued from the stochastic block model (SBM) G m (n, p, q), with m classes of size n. The structure of G m n;p,q was described in the Introduction. Recall that the edges are drawn independently at random, and each intra-class edge is present (or open) with probability p, while each inter-class edge is present (or open) with probability q.
Here we introduce some additional notation. Note that ρ m 1 is not commutative.
The main point of RMM and of the study conducted in Section 2 is that a graphical construction of the connected component sizes of G m n;1−e −t ,1−e −u (completely analogous to the one for the multiplicative coalescent as discussed in the introduction) can be conveniently given as follows: In the third step, due to the elementary properties of independent exponentials, applying RMM u ·; A , R For t ∈ R, u ≥ 0 and n ≥ n 0 sufficiently large, letζ (n) (t, u) denote the vector of decreasingly ordered component sizes of G m n;n −1 +tn −4/3 ,un −4/3 . Let also ζ (n) (t, u) = n −2/3ζ (n) (t, u), t ∈ R, u ≥ 0, n ≥ n 0 .
We consider ζ (n) (t, u) to be a random element of l 2 (for this we append infinitely many zero entries). As discussed above we have for all n ≥ n 0 . Note that t n and u n are chosen according to the identities 1−e −tnn −2/3 n −2/3 = 1 n + t n 4/3 and 1 − e −unn −2/3 n −2/3 = u n 4/3 and that x (n) differs from x, defined above, by the normalization factor n −2/3 . Note that the multipliers in the exponent are compatible with the restricted multiplicative merging, since the mass of the particles is now n −2/3 . In the original graphical construction all the masses were equal to 1, and this is implicit in the expression for the first connectivity parameter (which is equal to 1 − e −t in the construction comprising (3.1)-(3.4)).
Let {Z 1 (t), t ∈ R}, . . . , {Z m (t), t ∈ R} be independent standard multiplicative coalescents. We set We thus rewrite (3.5) as inter , n ≥ n 0 . (3.7) It remains to show the right hand side of (3.7) converges in distribution to ζ(t, u) as n → ∞. As a corollary of Theorem 3 [1] and independence, we have that Z (n) (t) → Z(t) in (l 2 ) m in distribution as n → ∞. Since ρ m 1 is continuous map, the convergence in law extends to ρ m 1 (Z (n) (t)). The rest is a standard application of continuity of RMM operation from Section 2. We include an argument for self-containment.
By the Skorokhod representation theorem, we can choose a probability space (Ω, F, P) and a sequence of random elementsẐ (n) (t), n ≥ n 0 , andẐ(t) in l 2 m such that Then one can conclude ζ (n) (t, u n ) d =ζ (n) (t, u n ), n ≥ n 0 , ζ(t, u) d =ζ(t, u) and ζ (n) (t, u n ) →ζ(t, u) in probability as n → ∞, with respect to the l 2 norm on l 2 . Indeed, for ε > 0, we have where the convergence a.s. of the random sequence (of probabilities) inside the expectation is due to Proposition 2.1, and the final conclusion due to the dominated convergence theorem. As already explained, this completes the proof of the theorem.

Concluding remarks Phase transition of the SBM
We recall that if f and g are two sequences, then f (n) g(n) (or equivalently g(n) f (n)) means that lim n g(n)/f (n) = 0, and f (n) ∼ g(n) means that lim n g(n)/f (n) = 1.
Let C(n, p n , q n ) denote the size of largest component of G(n, p n , q n ). We can conclude from Theorem 3.1 in the previous section that (i) if p n − 1 n ∼ t n 4/3 and q n ∼ u n 4/3 , n → ∞, then for all M ∈ (0, ∞) lim n→∞ P n −2/3 C(n, p n , q n ) > M ∈ (0, 1); We remark that in the case p n − 1 n ∼ t n 4/3 , q n u n 4/3 , n → ∞, the scaling limit of the stochastic block model G(n, p n , q n ) is described by a family on m independent standard multiplicative coalescent without interaction. Hence, (4.1) remains true.
We also have from the pure homogenous graph setting that if p n − 1 mn ∼ t n 4/3 and q n − 1 mn ∼ t n 4/3 , then the normalized vector of ordered sizes of connected components of G(n, p n , q n ) converges to a value of standard multiplicative coalescent at time m 4/3 t. In particular, (4.1) is also satisfied. In addition, the main result of Bollobás et al. [6], applied to the SBM, says that if p n ∼ c mn and q n ∼ d mn and (i) if c+(m−1)d > m, then 1 n C(n, p n , q n ) converges to a non-zero number in probability. (ii) if c + (m − 1)d ≤ m, then 1 n C(n, p n , q n ) converges to zero in probability.

Markov property of the interacting multiplicative coalescent
Recall the notation of Section 3 and in particular the construction resulting in (3.4). It should be clear that R t , t ≥ 0, is a Markov process. However, for any fixed u > 0 the process inter , t ≥ 0, does no longer have the Markov property. The main obstacle is in the "loss of information" on the class membership once the restricted merging RMM u is applied. For the same reason, for any fixed t, the process is no longer Markov. These observations are made on the discrete level, before passing to the limit. The same remains true for the interacting multiplicative coalescent. Nevertheless, the first process above and its scaling limit, given in Section 3, are not far from being Markov (they are hidden Markov), and they are still amenable to analysis. In a forthcoming work [11] we construct an excursion representation of interacting multiplicative coalescent, analogous to those obtained by [1,2,3,8,15,14], however more complicated, and its complexity increases with m.

A Appendix
This auxiliary material is included for reader's benefit. The multiplicative coalescent properties proved below are interested in their own right, and our intention is to obtain their generalizations in a separate work in progress [12].

A.1 Preliminaries
We rely on the notation introduced above. In particular, if x is a vector in l 2 or l 2 , then x is its l 2 -norm. We reserve the notation X := (X(t), t ≥ 0) for any multiplicative coalescent process, where its initial state will be clear from the context. Recall that If n ∈ N then [n] = {1, 2, . . . , n}. Here and below A denotes a matrix (or equivalently, a two-parameter family) of i.i.d. exponential (rate 1) random variables.
Let (G t (x; A, R * )) t,x be the family of evolving random graphs on (Ω, F, P) as constructed in the introduction. We now fix x ∈ l 2 and t > 0, and describe a somewhat different construction of the random graph G t (x; A, R * ).
We also define the product σ-field F 0 = 2 Ω 0 and the product measure where P i,j is the law of a Bernoulli random variable with success probability P i,j {1} = P {A i,j ≤ x i x j t}. Elementary events from Ω 0 will specify a family of open edges in G t (x; A, R * ). More precisely, a pair of vertices {i, j} is connected in G t (x; A, R * ) by an edge if and only if ω i,j = 1 for ω = (ω i,j ) i<j ∈ Ω 0 . In other words, P 0 x,t is an "inhomogeneous percolation process on the complete infinite graph (N, {{i, j} : i, j ∈ N})" (we include the loops connecting each i to itself on purpose). It is clear that the law of thus obtained random graph G t (x; A, R * ) is the same (modulo loops {i, i}) as the one constructed in the introduction. Note that R * is the maximal partition, so these are all graphical constructions of the multiplicative coalescent , equivalent to the Aldous [1] original one.
For two i, j ∈ N we write {i ↔ j} = {{i, j} is an edge of G t (x; A, R * )} and we may also write it as at {{i, j} is open}. We also write {i ∼ j} for the event that i and j belong to the same connected component of the graph G t (x; A, R * ). Then we have, ω-by-ω, that i ∼ j if and only if there exists a finite path of edges Then one can trivially recognize Part of our argument relies on disjoint occurrence. We follow the notation from [4], since they work on infinite product spaces. We will use an analog of the van den Berg-Kesten inequality [17], the theorem cited below is an analog of Reimer's theorem [16].
Given a finite family of events A k , k ∈ [n], from F 0 we define the event n k=1 A k = {A k , k ∈ [n], jointly occur for disjoint reasons}.
Readers familiar with percolation can skip the next paragraph and continue reading either at Lemma A.1 or Section A.2.
Let for ω ∈ Ω 0 and K ⊂ N 2 be the thin cylinder specified through K. Then the event where the union is taken over finite disjoint subsets J k , k ∈ [n], of N 2 < .
Let i k , j k ∈ N and i k = j k , k ∈ [n]. Then we have clearly , via mutually disjoint paths .
The following lemma follows directly from Theorem 11 [4], but since the events in question are simple (and monotone increasing in t) this could be derived directly in a manner analogous to [17].
Lemma A.1. For any i k , j k ∈ N and i k = j k , k ∈ [n], we have P 0

A.2 Some auxiliary statements
Recall Lemma 2.3. The goal of this section is to obtain a similar estimate for triples and four-tuples of vertices.
Proposition A.2. There exists a constant C such that for every x = (x k ) k≥1 ∈ l 2 and t ∈ (0, 1/ x 2 ) Remark A.3. We conjecture analogous estimates for k-tuples of vertices.
Proof of Proposition A.2. We will focus on the proof of the second inequality. The proof of the first one is similar and simpler. Let I := {i 1 , . . . , i 4 } and I c = N \ I. We consider {i 1 ∼ · · · ∼ i 4 } as an event in the probability space (Ω 0 , F 0 , P 0 x,t ) and observe that it can Hence, adding over all the five terms above gives as stated.

A.3 Finiteness of the fourth moment of the multiplicative coalescent
Let X(t) = RMM t (x; A, R * ), t ≥ 0, be a multiplicative coalescent starting from x ∈ l 2 .
The main goal of this section is to prove the following theorem.
In order to prove the theorem, we first show the finiteness of the fourth moment of the multiplicative coalescent for small t and then extend this result for all t.
Lemma A.5. There exists a constant C > 0 such that for every x ∈ l 2 and t ∈ (0, 1/ Proof. For convenience of notation we will here use a natural convention that for each i we have i ∼ i almost surely, as indicated in Section A.1. Using Proposition A.2 and the fact t This finishes the proof of the lemma. Recall that the multiplicative coalescent X(t), t ≥ 0, is a Markov process taking values in l 2 . Using its generator (in particular, applying it to X(t) 2 ) one concludes that the process is a local martingale (see also equality (68) in [2]). We will use this fact in order to show the finiteness of the fourth moment of the multiplicative coalescent at small times.
Proposition A.6. There exists a constant C such that for every x ∈ l 2 and t ∈ [0, 1/ x 2 ) In particular, E X(t) 4 < +∞.
Then M (t ∧ τ n ), t ≥ 0, is a martingale for every n ≥ 1, where M is defined by (A.1). Consequently, for all n ≥ 1. Using Lemma A.5, the monotonicity of X(t) in t (see for example Lemma 1.1) and the estimate for the second moment of the multiplicative coalescent, which can be obtained in a way similar to the proof of Lemma A.5 (see also Lemma 2.4), we get By Fatou's lemma, we derive (A.2). The finiteness of E X(t(1 − δ)) 4 for any small positive δ now follows again from the monotonicity of E X(t) 4 in t, and this in turn implies the stated claim.
Let A = (A i,j ) i,j be an independent copy of A = (A i,j ) i,j . As in Section A.1 let (ω i,j ) i<j be an independent family of Bernoulli random variables, where ω i,j has success probability P i,j {1} = P A i,j ≤ x i x j s . We say that "{i, j} is open via A " on the event {ω i,j = 1}. Note that this does not exclude {ω i,j = 1} from happening. Similarly we say that "{i, j} is open via A" on the event {ω i,j = 1}. We will write i ↔ A j whenever {i, j} is open via A.
Denote byG t,s (x; A, A , R * ) the graph (in fact, it is a multi-graph) constructed by superimposing the edges open via A onto G t (x; A, R * ). Elementary properties of independent exponentials imply that the vector of ordered component sizes ofG t,s (x; A, A , R * ) is equal in law to RMM t+s (x; A, R * ). Due to the reasoning of the paragraph above (A.3), the vector of ordered connected component masses of the graph on the right-hand side is distributed as RMM 3t 2 (x; A, R * ). Denote by Y the vector of order connected component sizes ofG t, t 2 (x g ; A, A , R * ), and observe that Y d = RMM 3t 2 (x g ; A, R * ) for the very same reason. Therefore, a contradiction.