Interacting diffusions on sparse graphs: hydrodynamics from local weak limits

We prove limit theorems for systems of interacting diffusions on sparse graphs. For example, we deduce a hydrodynamic limit and the propagation of chaos property for the stochastic Kuramoto model with interactions determined by Erdös-Rényi graphs with constant mean degree. The limiting object is related to a potentially infinite system of SDEs defined over a Galton-Watson tree. Our theorems apply more generally, when the sequence of graphs (“decorated" with edge and vertex parameters) converges in the local weak sense. Our main technical result is a locality estimate bounding the influence of far-away diffusions on one another. We also numerically explore the emergence of synchronization phenomena on Galton-Watson random trees, observing rich phase transitions from synchronized to desynchronized activity among nodes at different distances from the root.


Introduction
The results of this paper were inspired by a concrete problem. Let n ∈ N and 0 < p(n) ≤ 1. Define the Erdős-Rényi random graph G n = G(n, p(n)) as the random graph with vertex set [n] := {1, . . . , n} where two vertices are adjacent with probability p(n), independently of all other pairs. Write i ∼ (n) j if i, j ∈ [n] are adjacent and let d (n) i denote the degree of i in G n . We consider the stochastic Kuramoto model [16] over each realization of the graph G n , which is defined as a system of interacting diffusions indexed by i ∈ [n], solutions of the following system of Itô Stochastic Differential Equations (SDEs) in time interval [0, T ]: dθ (n) i (t))dt + ω i dt + dB i (t). (1.1) Here the B i are independent Brownian motions, and the initial positions θ (n) i (0) and "natural frequencies" ω i are sampled from some product measure independently from the B i and G n . We adopt the convention that the first term in the RHS of (1.1) is zero in the case that d The following question arises.
Problem: What is the bulk behavior of this system when n → +∞ for different choices of p(n)?
More precisely, we want to understand the behavior of the empirical measure of particle trajectories over a time interval [0, T ]: that is a random measure over the space of continuous functions from [0, T ] to R, the space C([0, T ]; R). Our problem is potentially interesting because the graph G n can be very different depending on p(n). For instance, when n → +∞, G n is typically connected if p(n) log n/n and typically disconnected if p(n) log n/n. As it turns out, all that matters for our problem is the behavior of np(n) as n → +∞, which is the expected degree of a vertex in G n (up to a small error). In a recent paper [26], we proved that np(n) → +∞ implies that L n has the same a.s. limit and obeys the same large deviations principle as in the case p(n) ≡ 1 of a complete interaction graph. In particular, the limit of L n is the law of a McKean-Vlasov diffusion, a Markovian process with trajectories in C([0, T ]; R).
In this paper we complement the result for np(n) → +∞ by describing what happens when np(n) → c ∈ R + . We prove that L n converges to the law of a non-Markovian process, which is described by a system of the form (1.1) on a potentially infinite Galton-Watson (GW) tree. The mechanism behind this fact is a general theorem relating the local weak convergence of networks to the hydrodynamics of systems of diffusions on these networks. Remark 1.1. To the best of our knowledge, the preprint version of this paper was the first work to explicitly relate systems of interacting diffusions to local weak convergence. However, an earlier paper by Maclaurin [22] obtained stronger large deviation results in the specific setting of diffusions over discrete tori, which converge to a system over Z d .
In that sense, Maclaurin's result anticipates some aspects of our work. A few months after the present paper appeared in the Arxiv, Lacker et al. [17,18] independently arrived at more systematic (and essentially more general) results of the same kind. More details about this related work are discussed in Section 3.2.
Finally, we numerically investigate synchronization phase transitions for the stochastic Kuramoto Model on GW trees. In particular, we compute synchronization levels among nodes at different distances from the root, by varying the coupling strength between oscillators, as well as their natural frequencies and initial conditions. In contrast with the full interaction case of the complete graph, we generally observe the emergence of desynchronization phenomena at distant nodes in the sparse setting.
In section 2, we give an informal description of our main results. In section 3, we make comments about the proofs, review past results, and give the outline of the remainder of the paper. EJP 25 (2020), paper 110.

Infinite networks and interacting diffusions
We will need the concept of a network. Informally, this is an object of the form N = (G, µ, ω, θ(0)), where: 1. G = (V, E) is a locally finite graph with countable vertex set V and edge set E. 2. µ = (µ e ) e∈E is a vector of positive weights µ e > 0 for the edges of G. 3. ω = (ω v ) v∈V is a vector of "media variables" ω v ∈ R associated with the vertices. 4. θ(0) = (θ v (0)) v∈V is a vector of initial conditions θ v (0) ∈ R for each vertex.
We wil call µ the edge marks and ω, θ(0) the vertex marks. We will say that a network N is finite if the graph G is finite. We will often abuse notation and write "v ∈ N " instead of "v ∈ V ". We also write µ vu = µ uv := µ e for the weights of pairs e = {v, u} ∈ E, and set µ vu = 0 if vu is not an edge.
Suppose N is given. Let ψ : R 2 → R, φ : R 4 → R be Lipschitz functions with only φ bounded, and define for each v ∈ V , the total weight µ v u∈V µ uv φ(θ N u (t), θ N v (t); ω v , ω u )dt + ψ(θ N v (t); ω v )dt + dB v (t), (2.1) in the time interval [0, T ] and with initial conditions (θ v (0)) v∈N . Heretofore, we adopt the convention that the first term in the RHS of (2.1) is 0 whenever µ v = 0. When N is finite, our conditions on ψ and φ are more than sufficient to imply existence and uniqueness for this problem. Our first finding is that the same holds for infinite networks with at-most-exponential growth.
Interacting diffusions on sparse graphs can be matched nearly exactly. Note that, for this to make sense, we need to consider these networks up to "rooted isomorphisms"; see Section 5 for details. Now, given a sequence of finite networks {N n = (G n , µ n , ω n , θ(0) n )} n∈N , let o n be a node of N n chosen uniformly at random, for each n ∈ N. Let L(N n , o n ) denote the law of the random rooted network (N n , o n ); all the randomness comes from the choice of o n . We say that {N n } n∈N converges in the local weak sense to a distribution ν over rooted networks if the probability laws L(N n , o n ) converge weakly to ν.

Remark 2.2.
If we forget about the marks µ n , ω n and θ(0) n , this is nothing but the better known concept of local weak convergence of graphs. In this case, it is known e.g. that n-cycles converge to the deterministic rooted graph δ (Z,0) ; that the Erdős-Rényi graph G(n, c/n) a.s. converges to a Poisson GW tree with parameter c; and that random d-regular graphs on n vertices a.s. converge to the infinite d-regular tree. We show in Section 5.2.1 that if the marks are chosen independently, with each vector µ n , ω n , θ(0) n i.i.d., then the corresponding networks converge a.s. in the local weak sense. Now note that, for a arbitrary network N , if we can define a system of interacting diffusions over N this gives rise to a random network where the initial conditions are replaced by the particle trajectories in the time interval [0, T ]. We are abusing notation by calling by the name network two different classes of objects.
Coming back to the sequence of finite networks (N n ) n∈N , the next theorem relates the local weak convergence of N θ n to that of N n . Theorem 2.3 (Loose statement of Theorem 6.8 and Corollary 6.10). Assume that ν is a probability measure on rooted networks which is supported over pairs (N, o) satisfying the assumptions of Theorem 2.1. Then for almost all samples (N, o) ∼ ν we can solve the system of interacting diffusions (cf. (2.1)) as in Theorem 2.1 to consider (N θ , o). Now consider a sequence of networks {N n } n∈N , each N n with n vertices, which converges in the local weak sense to ν. Assume also that the largest vertex degree in N n is n o(1) for large n. Then, almost surely, the sequence {N θ n } n∈N of networks marked with the diffusions is locally weakly convergent to the law of (N θ , o) when (N, o) ∼ ν. As a consequence, the empirical measures converge almost surely to the distribution of θ N o (·) when (N, o) is sampled from ν.
Our results also imply a propagation-of-chaos property (see Corollary 6.12).

Synchronization phenomena and sparsity
We come back to the particular case of stochastic Kuramoto model. Our results from theorems 2.1 and 2.3 motivate us to explore synchronization phenomena on finite GW trees, since they appear as the limit object from a sequence of random Erdös-Rényi graphs. If we denote T as the random GW tree with m vertices rooted at vertex 1, we consider the system of SDEs: for each i ∈ [m], where θ T j (t) and ω j represent the angular phase and natural frequency of the oscillator indexed by j ∈ {1, 2, . . . , m}, respectively. The parameter K ∈ R + represents the coupling strength between nodes, and a ij = 1 if nodes i and j are connected in T or a ij = 0 otherwise. In our numerical analysis, we do not divide the summation over neighbours of i by the degree of i. We sample both initial conditions θ T j (0) and natural frequencies ω i from distinct distributions, and by changing the coupling strength between nodes, we compute synchronization levels between the root and those nodes at different distances. In our simulations, we chose two different models for generating the GW trees: 1. Binomial model: The offspring is a binomial random variable with distribution Bin(n, p).

D-regular model:
The root node has C children, while the other ones have exactly C − 1 children.
In Section 10, we describe our numerical methods and results in details. Interestingly, we observe how desynchronization emerges among distant nodes, depending on the choice of the model parameters. These findings enlighten our understanding of synchronization in complex networks and pave the way for new phase transition studies on Kuramoto dynamics.

Comment on proofs
We now briefly comment on our proofs. The key step is to show that our system (cf. (2.1)) satisfies a locality property, Lemma 7.1 below. Loosely speaking, this property states that information does not propagate too fast over the graph in systems like (2.1). To prove this Lemma, we rely on a linear Gronwall argument, which leads to a matrix exponential. A nice wrinkle in the proof is that this exponential can be related to a heat kernel for a random walk over a network, which we can analyze via the Carne-Varoupoulos bound. With this Lemma in hand, our main results follow easily from general principles, including the definition of weak convergence.
One last comment is that it seems clear that our result is an exemplar of a more general principle. One can gather from our arguments that "local" systems of particles on graphs should have a "local hydrodynamic limit" whenever the sequence of underlying graphs converges. In this sense, our main technical contributions consist of formulating this principle precisely and proving the required locality estimate in our setting.

Discussion
As stated above, our motivation was to understand what happens to interacting difusions in the simple case of an Erdös-Rényi graph with a constant average degree. Our recent preprint [26] showed that the entire regime of a diverging average degree has the same behavior, even at the level of large deviations, as the complete graph (mean-field interactions). Of course, proving an LDP in the setting of the present paper is an interesting topic for further study.
We continue with a very brief review of the literature, referring to [26, Section 1.2] for more details. The study of our class of systems over complete graphs is a classical topic; see e.g. [28,25] for early results. Recent papers have obtained hydrodynamic limits in settings with singular interactions [20,21] or Gaussian couplings and delays [9,7,8]. More recently, several authors [11,23,14] have explicitly considered the case of relatively sparse random graphs. A recent preprint by Coppini et al. [13] obtains EJP 25 (2020), paper 110.
an LDP under a stronger degree condition than in our paper [26], but with otherwise weaker or incomparable assumptions.
To the best of our knowledge, the present work is the first paper to explore how interacting diffusions behave in the bulk of relatively general graphs of constant average degree. However, we have recently become aware of earlier work [22] by Maclaurin. In that paper, the author proved a large deviation principle for interacting diffusions indexed by subsets of the lattice Z d . In the present paper, we only obtain a law of large numbers for the diffusion. However, Maclaurin's methods rely on the specific structure of Z d , whereas our results apply more broadly.
A few months after the first version of our paper appeared in the Arxiv, Lacker, Ramanan and Wu [17,18] posted two preprints where they consider related systems of interacting diffusions over sparse graphs. Outside of our considering weighted edges, their results greatly generalize ours, by allowing the drift and diffusion coefficients of a vertex to depend nonlinearly on the empirical measures of neighboring vertices. Additionally, they make weaker requirements on the sequences of graphs. One technical difference is that our proofs give more quantitative estimates on correlation decay, whereas they rely on "softer" weak convergence tools. Their preprint [18] obtain closedform descriptions of the non-Markovian dynamics of a vertex and its neighbors.
From the synchronization viewpoint, our study introduces novel results for the Kuramoto Model on GW trees. Over the past years, many studies have analyzed synchronization phenomena on various network topologies ( [2,19]), yet little attention has been given to sparse random trees. More recently, Chiba et al. [12] studied transitions to synchronization for a large family of random graphs, relating their onset of synchronization and the well-known phase transition for the fully connected network. With a more computational approach, Sokolov and Ermentrout [27] related network structure with global stability of phase-locked solutions. For power-law random networks, Medvedev and Tang [24] studied the effects of scale-free connectivity and compared the synchronization thresholds with dense graphs. In contrast with all those recent findings, our analysis on GW-trees allows to investigate the emergence of desynchronizaton among nodes that are distant from the root, which illustrates how full synchronization is not always achievable by increasing the coupling strength beyond a fixed value.

Organization of the paper
The remainder of the work is organized as follows: Section 4 reviews notation of functions and measures. In Section 4 we also present some preliminaries about graphs. Section 5 reviews networks and local weak convergence. The reader familiar with networks and local weak convergence just need to read this section to know what notation we adopted here. Section 6 states in full details our main results.
We prove the Locality lemma in Section 7, and the other main results are derived from this lemma in subsequent sections. We solve the infinite system of SDEs in Section 8, and we address the hydrodynamic limit in Section 9.
Finally, in Section 10 we present numerical simulations to discuss the synchronization phenomena. Auxiliary results are found in the Appendix, starting at section A.

Preliminaries
In this section we fix notation and briefly review some important concepts.

Numbers
N is the set of nonnegative integers. For a natural number n ∈ N\{0}, we let [n] := {1, . . . , n}. We define the maximum and minimum of two numbers x, y ∈ R by x ∨ y and x ∧ y, respectively. We define R + = {x ∈ R : x > 0}.

Functions and spaces of probability measures
Let (S, d) be a Polish metric space. We define C(S; R) := {h : S → R : h is continuous}, and for a map h : S → R, we have the norms: We let P(S) denote the set of probability measures over (the Borel sets of) S. If X ∈ S is a random element we denote δ X ∈ P(S) the Dirac measure at X which is a random measure.
Given a measure µ ∈ P(S) and a Borel function h : S → R we write If X ∈ S is a random element, and E [ · ] is the expectation in the probability space that X is defined, we write L(X) ∈ P(S) for its law: The topology of weak convergence in P(S) is metrized by the Bounded-Lipschitz metric defined for µ, ν ∈ P(S): If X and Y are random elements in S defined on the same probability space and L(X) = µ and L(Y ) = ν then

Graphs
In this paper, a graph G = (V, E) has vertex set V and unoriented edge set E. The set V is either finite or countably infinite. We write x ∼ y to denote that xy = yx := {x, y} ∈ E. Notice that we allow x ∼ x (i.e. a loop edge). The degree d x of x ∈ V is the number of y ∈ V with y ∼ x. When we need to specify the dependency on G we write V G , E G , x ∼ G y, and d G x . We always assume G is locally finite, i.e. d x < +∞ for all x ∈ V . We write |G| and e(G) for the number of vertices and edges in G, respectively.
Given a subgraph H ⊂ G we define We write dist(v, u) for the distance between v, u ∈ V , i.e., the size of the shortest path between v and u in G,  We will also consider weighted graphs. For a graph G = (V, E) the vector µ = (µ e ) e∈E is a vector of weights for G if µ e > 0 for any e ∈ E. To each vector of weights µ we can associate a matrix (µ vu ) v,u∈V such that for v, u ∈ V • if e = {v, u} ∈ E then µ vu = µ uv = µ e , and • if {v, u} / ∈ E then µ vu = µ uv = 0.
We write µ v = u∈V µ vu for the total weight of v. We say that (G, µ) is a weighted graph and we identify the vector µ with the associated matrix (µ vu ) v,u∈V .

Models of random graphs
Some examples of our theory are related to random graph models. Given n ∈ N, p ∈ [0, 1], the Erdös-Rényi random graph G(n, p) is the random graph with vertex set [n] with no loops, where any two distinct x, y ∈ [n] are adjacent with probability p, independently of all other pairs. We consider (as is customary) sequences of random graphs G(n, p) where p = p(n) may depend on n.
Given n ∈ N and d = (d 1 , . . . , d n ) ∈ N n , we let G(n, d) denote a random graph with degree sequence d, i.e. a graph that is chosen uniformly at random from the set of graphs G with V (G) = [n], no loops, and d . This makes sense only for certain sequences d. One important particular case is that of random d-regular graphs, where d = (d, d, . . . , d) for some d ≥ 3 and we only need to assume dn even.

Local metrics and weak convergence of networks
In this section, we review the basic aspects of the local topology and local weak convergence of networks. We start with the case of graphs, which is better known. We then discuss the case of networks with more details. Our main references are the survey by Aldous and Steele [1], the lecture notes by Bordenave [5], and the paper [4].

Rooted graphs and local weak convergence
When we consider sparse random graphs, we will need to consider their local weak limits.
A rooted graph (G, o) consists of a (countable, locally finite) graph G with a distinguished vertex o ∈ V G . Two rooted graphs (G, o), (H, p) are rooted isomorphic ((G, o) ∼ = (H, p)) if there exists a bijection f : V G → V H mapping o to p and preserving edges. The space G * of rooted graphs considered up to isomorphisms can be endowed with a metrizable "local topology" that makes it a Polish space. Therefore, we may speak of random elements in this space (we will define a more general metric on networks below).
Given r ∈ N, (G, o) r is the rooted graph with root o that contains the vertices x ∈ V G within distance r from o, and all the edges between these vertices.
, v] for its equivalence class up to isomorphism.
Definition 5.1. For each finite graph G ∈ G we define the empirical neighbourhood distribution: We say that a sequence of finite graphs G n converges locally weakly to the measure ρ ∈ P(G * ) if U (G n ) → ρ in the weak topology of P(G * ).
If the sequence of finite graphs G n is random then we say G n converges almost surely to ρ in the local weak sense if the locally weakly convergence of G n to ρ holds in a set of probability 1 with respect to the law of the sequence G n .
Example 5.2. Cycle graphs C n with n vertices locally weakly converge to δ (Z,0) . Example 5.3. Suppose that, for each n, G n has the law of the Erdös-Rényi random graph G(n, c/n) for c > 0 constant. Then for almost all realizations of G n , the sequence G n locally weakly converges to the rooted GW tree with Poisson offspring distribution with mean c.
Example 5.4. Suppose that for each n ∈ N we have a vector Assume the sequence d n has max 1≤i≤n d i,n ≤ n n with n → 0 and the measures converge weakly to some P with finite first moment. If we sample G n from G(n, d n ), then for almost all realizations of G n , G n locally weakly converges to the unimodular rooted GW tree U GW (P ), where the root has offspring distribution P , and all other nodes have offspring distribution In particular, if d n = (d, d, d, . . . , d) for all n, U GW (P ) is the infinite (deterministic) d-regular tree rooted at a node.

Rooted networks and local weak convergence
Roughly speaking, a network is a graph G = (V, E) with parameters (or marks) associated to the vertices and edges of G. The parameters (or marks) lie in some metric space.
More specifically, let (Υ, d Υ ) and (Ξ, d Ξ ) be two Polish metric spaces. A network N = (V, E, υ, ξ) is a graph G = (V, E) together with the vectors υ = (υ v ) v∈V ∈ Υ |V | and ξ = (ξ e ) e∈E ∈ Ξ |E| that gives marks to the vertices and edges of G, respectively. We write N (Υ,Ξ) for the space of all theses networks with the mark spaces fixed.
, the vector υ is the restriction of υ to V and ξ is the restriction of ξ to E. In this case we also say that the sub-network N is induced by the sub-graph (V , E).
When we write a graph property for a network N = (V, E, υ, ξ) it is implicitly assumed that this property holds for the underlying graph. For example, The boundary of a sub-network N of N is the boundary of the corresponding graphs.
Consider two networks N = (V, E, υ, ξ) and N = (V , E , υ , ξ ) belonging to N (Υ,Ξ) . A network isomorphism Ψ between N and N is a bijection Ψ : V → V between the vertex sets that preserves edges and marks:  We now define a notion of distance over rooted networks up to isomorphism. This is not the exact same notion as in [5], but it is equivalent to it, as a simple calculation shows.
that the corresponding marks are close by δ: Given two classes of rooted networks, we can define the distance between them using any representatives of these classes. This is well defined since the distance between rooted networks in invariant up to rooted isomorphism.
Sometimes we identify a rooted network with its equivalence class.
For a network N and a vertex v ∈ V we associate the rooted network (N (v), v) that is the network induced by the connected component of v rooted at v.
We write [N (v), v] for the equivalence class of the rooted network (N (v), v). For a finite network N with vertex set V we define the empirical measure neighbourhood:

Definition 5.5 (Local weak convergence). Consider a sequence of networks
Let ρ ∈ P N * (Υ,Ξ) . We say that N n converges locally weakly to ρ if U (N n ) → ρ in the sense of weak convergence. If the sequence of finite networks N n is random then we say that N n converges almost surely to ρ in the local weak sense if the locally weakly convergence of N n to ρ holds in a set of probability 1 with respect to the law of the sequence N n .

Networks with i.i.d. marks that converge locally weakly
Now we give examples of networks that satisfy Definition 5.5.
Let G = (V, E) be a graph. In Section 2 we introduced the networks where we have the vector of weights µ = (µ e ) e∈E ∈ R |E| + that are marks for the edges of G, environment or "media" variables ω = (ω v ) v∈V ∈ R |V | , and a vector of initial conditions θ(0) = (θ v (0)) v∈V ∈ R |V | that are marks for the vertices of G.
Definition 5.6. We write N for the collection of networks N = (G, µ, ω, θ(0)) with edge marks in R + and vertex marks in R × R. When we distinguish a root for N we write N * for the collection of rooted networks up to isomorphism.
Our first goal is to show that for a sequence of graphs (G n ) n∈N and under some additional conditions we can construct interesting examples of random networks N n ∈ N and a probability measure ν ∈ P(N * ) such that the sequence (N n ) n∈N converges to ν in the local weak sense, for almost all realization of the marks.
When we need to explicit the dependency on G we write µ G , ω G , and θ(0) G .
Fix the measures π, λ ∈ P(R). For a fixed vertex v, π is the distribution of the media variables ω v and λ is the distribution of the initial conditions θ v (0). Fix a measure µ ∈ P(R + ) for the distribution of the weights µ e for an edge e ∈ E.
We define by N G ∈ N * the random network obtained from G by adding these random marks.
In this way we have a transition kernel that associate to each rooted graph (G, o) the law of the random rooted network (N G , o) is the expectation of the probability space where the marks for G were defined and h : N * → R is a bounded measurable function. The proofs of the following results are given in the appendix because they are easier version of our main results (Theorem 6.7 and Theorem 6.8, respectively). With this result at hand, for any probability measure ρ over G * we can define ρM ∈ P(N * ) via the formula for any bounded measurable function h : N * → R.

Interacting diffusions on finite networks
In this section, we introduce our main objects of interest, and give formal versions of Theorem 2.1 and Theorem 2.3.
Fix functions ψ : R 2 → R and φ : R 4 → R. In our model, ψ represents a drift term that depends on the current position and the media variable, whereas φ corresponds to pairwise interactions that represent single-term drift and the interaction between the particles. As we can always suppose V ⊂ N we consider (B v ) v∈N a collection of i.i.d.

Standard Brownian Motions.
Definition 6.1. Let N = (G, µ, ω, θ(0)) be a network belonging to the set N in Definition 5.6. A system of interacting diffusions on the network N (with the choice of functions ψ, φ) is a random vector which is a strong solution of the following system of Itô Stochastic Differential Equations where the first term in the RHS in (6.1) is zero if µ v = 0, and the initial conditions are given by θ(0) = (θ v (0)) v∈V . When we need to make explicit the dependency on the network we will write θ(·) =: θ N (·).
When the network N is finite the standard theory of Itô SDEs guarantees that the system (6.1) has a unique strong solution with continuous trajectories whenever ψ and φ are Lipschitz-continuous (see [15], Theorem 2.9, Chapter 5). In the remainder of this work, ψ is assumed to be Lipschitz-continuous and φ is assumed to be Lipschitzcontinuous and bounded. We will argue that, under certain conditions, we also can solve (6.1) simultaneously with infinite equations (on infinite networks).
For each network N , if there exists a system of interacting diffusions over N , we can replace the initial conditions θ(0) by the vector of random continuous functions θ N (·) ∈ C([0, T ]; R) as new marks. This brings us to the following definitions. Definition 6.2. Given a network N = (G, µ, ω, θ(0)) ∈ N and a vector of continuous functions α(·) = (α v (·)) v∈G ∈ C([0, T ]; R) |G| we write C for the collection of networks (G, µ, ω, α(·)) with continuous functions replacing the vector θ(0) as new marks. That is, C has edge marks in R + and vertex marks in R × C([0, T ]; R). When we distinguish a root for (G, µ, ω, α(·)) we write C * for the collection of these rooted networks up to isomorphism. Definition 6.3. When N ∈ N is such that a system of interacting diffusions θ N (·) can be defined, we let N θ = (G, µ, ω, θ N (·)) be the corresponding random element of C defined above. Note that the law of N θ is invariant by network isomorphisms.
To present our main results, we need some additional definitions. We first restrict ourselves to the space of finite networks. because the law of N θ is invariant under isomorphisms of N . Therefore, Θ defines a map from N * f (the space of finite networks) to P(C * ) (the space of probability measures over C * ).

Main results
In this section we state our main results. Our results hold when the sequence {N n } n∈N converges to something "nice". In our next result, we use Theorem 6.7 to identify hydrodynamic limits of systems of interacting diffusions. We first note that, if ν is a probability measure over B * , the continuity (in particular measurability) of Θ allows us to define νΘ by the formula: where h : C * → R is a bounded measurable function and the expectation E [ · ] is with respect to the Brownian motions.   and 5.4 of almost surely (with respect to the graph randomness) local weak convergence we can construct networks N n that satisfy the assumptions of Theorem 6.8 for almost all realizations of marks and almost all realizations of the sequence G n . In particular, the theorem is true for the case of Erdös-Rényi and GW trees with probability 1 with respect to the law of the Erdös-Rényi graphs.
The hydrodynamic limit is in general stated for the empirical measure over particle trajectories.
We can obtain θ N o as a projection of (N θ , o). Using that the projection is continuous we obtain the following corollary.

The locality lemma
In this section, we introduce the main technical tool in our proofs, the Locality lemma. It will be present in the proofs of all the results in Section 6.1.
The Locality lemma basically establishes that our interacting diffusions over a finite subset H 0 of vertices are indifferent to parts of the network that are far away from H 0 . Following Definition 6.1, assume that we can define a system of interacting diffusions θ N (·) over N from the (B v ) v∈V (with ψ and φ fixed above and in the time interval [0, T ]). Also build a system θ N (·) over N with the same Brownian motions (this works because N is finite). Then the following holds: there exist C, r 0 > 0 depending only on T , ψ Lip and φ BL such that, almost surely on the Brownian motion randomness, The rest of the section will be devoted to the proof of Lemma 7.1. However, if the reader so wishes, she or he can skip the proof and go directly to Section 8. In this section the only randomness comes from the diffusions. So every "almost surely" statement in this section is with respect to the law of the Brownian motions.
The general idea of the proof is the following: In §7.1 we rewrite the diffusions θ N (·) and θ N (·) so that they can be compared easily. In §7.2, we prove a linear Gronwall inequality for the difference between the two systems. The proof is finished in §7.3. At this step, we need to analyze a certain matrix exponential. We do so via the theory of continuous-time random walks, most importantly Carne-Varoupoulos heat kernel bound [3, Section 5.1].
As noted in the introduction, Lacker et al. [17,18] essentially bypass locality estimates in their proofs. We strongly believe that the same locality estimates can be proven in their framework.

Preliminaries
To avoid cumbersome notation, we adopt the following notation conventions. Objects related to the network N = (G, µ, ω, θ(0)) are written without superscripts. Degrees of vertices are indicated via d v . We define a matrix P indexed by the vertices V of N via: With this notation, we are assuming the existence of the interacting diffusions over N that may be written as The network N = (H, µ | H , ω | H , θ(0) | H ) is induced by the sub-graph H ⊂ G. We will write the corresponding process somewhat differently. Define µ uv := µ uv I {v,u∈H} and µ v := u µ uv . We set: This matrix is in general different from P if v or u are either outside of H or in ∂H.
We may define another system of diffusions satisfying: for each v ∈ V (and not just the vertices in H), with initial conditions θ v (0) = θ v (0). With this definition, the diffusions inside H do not interact with those outside H. Since H is finite, the system inside H has a unique strong solution, so the θ v (·) with v ∈ H correspond exactly to the θ N v (·) in the statement of the Lemma. Our goal then is to bound sup t≤T |θ v (t) − θ v (t)| for v ∈ H 0 . More specifically, it suffices to prove the next result

A Gronwall bound
The next proposition will be used in our Grownwall argument. To state it, we let H denote the set of vertices within distance at most 1 of H, that is Also, we write I for the identity matrix.
where C > 0 depends only on ψ Lip and φ BL .
Proof. Since the initial conditions are coupled to be equal they cancel when we calculate First observe that the sums involving P and P for u ∈ G are in fact over u ∈ H since v ∈ H. This is due the fact this matrices have non-zero entries P ·,v , P ·,v just for neighbors of v. In this way we have the following bounds Now we bound the RHS of the first inequality. For this, we note that if v ∈ ∂H or µ v = 0, then P uv = P uv for all u. On the other hand, if v ∈ ∂H and µ v > 0, then: EJP 25 (2020), paper 110.
In any case we have that, Combining these bounds, we obtain the result.

End of proof
We now finish the proof of Lemma 7.2, which implies Lemma 7.1. We will apply the Linear Gronwall's Inequality (Corollary C.2) to finish the proof of Lemma 7.2.
Going back to Proposition 7.3 we observe that H is finite. In particular, the matrices P and I considered are finite-dimensional. So we can apply Corollary C.2 with: 3. M (t) uv = M uv = C(P + I) uv , for u, v ∈ H and we observe that this matrix does not depend on time t, it is entry-wise non-negative and it is finite dimensional.
In vector notation, we obtain: and the Corollary says that u(t) ≤ exp(tM ) a entrywise. This is the same as saying that, To bound this last expression, we note that exp(Ct (P − I)) uv = µ u q Ct (u, v) where q Ct (u, v) is the heat kernel at time Ct of a continuous time random walk over H with transition rates equal to 1 and reversible transition probabilities P uv (reversibility follows from symmetry of µ uv ). The Carne-Varoupoulos bound for the heat kernel (Theorem 5.17 of [3, Section 5.1]) implies that for any time s ≥ 0 and any v, u ∈ H with R = dist(v, u) ≥ es (e is the Euler constant) We apply this with s = Ct ≤ CT , and obtain that, if R = dist(v, ∂H) ≥ eCT , then: So we finish by taking r 0 = eCT and adjusting C accordingly.

Interacting diffusions over infinite graphs
In this section, we prove Theorem 6.7. We first give a sketch of the argument.   (N, o) r are finite. We will use the fact that (N, o) is nice (cf. Definition 6.6) and Lemma 7.1 to show that the sequence (N, o) θ r (considering o as the root) has a limit in distribution. We will then show that the limit is the rooted network (N θ , o) replacing the initial conditions by θ N (·) that is the unique strong solution of the infinite system of SDEs in (6.1) for N .
3. To prove the continuity of the map (N, o) nice → Law of (N θ , o) we will also need to use that for finite networks the map (N, o) finite → Law of (N θ , o) is continuous We give the formal proof of Theorem 6.7 over the next subsections.

Proof of existence
Throughout the proof, we will assume that we are given i.
with the weights µ = (µ vu ) vu∈E G , media variables ω = (ω v ) v∈V G and initial conditions θ (r) (0) = (θ v (0)) v∈V G determined by the network (N, o) r . We also use the notation µ The fact that the RHS goes to 0 for fixed s and r , r → +∞ implies that, for each v ∈ V , θ (r) v (·) is a Cauchy sequence (in the uniform norm over [0, T ]) and converges over [0, T ] to a continuous function θ N v (·). We also have the estimate: and we will show now that this estimate implies that θ N = (θ N v (·)) v∈V is a system of interacting diffusions over N in the sense of Definition 6.1. To do this first observe that µ (r) v = µ v for large r. For any fixed r ≥ 1 we have from (8.1) as we wanted.

Proof of uniqueness
The above implies existence of a system of interacting diffusions over N . Uniqueness of such a process is also easy to obtain. Indeed, suppose β N is another strong solution to the same system of equations defined in terms of the same Brownian motions (B v ) v∈V . Then an application of the locality result, Lemma 7.1, to H 0 = {v}, H = (N, o) r for large r, and N , reveals that (8.2) must also hold with β N v replacing θ N v . Therefore,

Proof of continuity
What we have seen so far is that for each nice rooted network (N, o) one may uniquely define a system of interacting diffusions θ N (·). Let [N θ , o] ∈ C * be the resulting random network when one replaces the initial condition for the diffusions as new marks to the vertices (in the sense of Definition 6.3). So let Θ[N, o] denote the law of [N θ , o] ∈ C * (Definition 6.5). The uniqueness statement above implies that Θ extends the definition of Θ over finite networks.
We must now show that Θ is a continuous map from B * (the set of nice networks) to P(C * ) (the set of probability measures over C * with the BL metric). We start with some preliminaries. We note once again that, due to Lemma 7.1, we have the more precise estimate: The important point is that (

Interacting diffusions on sparse graphs
The triangle inequality gives: When n → +∞, the first term in the RHS shrinks to 0. Using (8.5) to bound the other two terms, we obtain: Since r ≥ r 0 + s are arbitrary (and r 0 is constant), we may let s = r/2 and make r → +∞

Hydrodynamic limit
In this section we will prove Theorem 6.8.
Our goal is the following. Given that U (N n ) → ν in the space P(N * ) and up to some additional conditions we want to show that U (N θ n ) → νΘ in P(C * ) almost surely.
In this section "almost surely" means "almost surely with respect to the law of the Brownian motions". By a standard argument, it suffices to show that for any test function h : C * → R with h BL ≤ 1, with respect to the law of the Brownian motions. It will be useful to consider the intermediate expression: is with respect to the Brownian motions. Since Θ : B * → P(C * ) is continuous (by Theorem 6.7) and U (N n ) → ν weakly, one may easily show that Therefore, our goal is tantamount to showing that: with respect to the law of the Brownian motions, where we recall h BL ≤ 1. The proof idea for (9.2) is to use the fact that U (N θ n )(h) is a function of independent Brownian motions. If we could control the effect of replacing one of the Brownian motions, then we can prove concentration by Azuma's inequality. To make this work, we will need to consider a truncated process for r fixed given by the networks: (N n (v), v) θ r EJP 25 (2020), paper 110.
replacing the initial conditions for the interacting diffusions θ (Nn(v),v)r (·). With this motivation we define the r-neighborhood empirical measure After considering these networks for r fixed we will need to return to our original network. In what follows we will need to use Azuma's inequality.
Assume that the function f : X n → R satisfies the bounded differences assumption: for Lemma 9.2. For any r ∈ N fixed, the following holds almost surely, with respect to the law of the Brownian motions, Proof. Recall that the assumptions on our networks imply: The strong solution assumption implies that for each v ∈ [n] there exists a measurable function Therefore, ,v)r =: f (B 1 (·), · · · , B n (·)).
Now fix a vertex w ∈ [n] and suppose we change the function B w (·). Then whenever w / ∈ (N n (v), v) r the function g v (B z (·)) z∈(Nn(v),v)r is unchanged. That is, the only EJP 25 (2020), paper 110. functions g v that are changed are those with v ∈ (N n (w), w) r . Using that h ∞ ≤ 1 we can conclude that f satisfy the bounded difference inequality (Theorem 9.1) with c w = |(N n (w), w) r | n .
From (9.4) the size of (N n (w), w) r is bounded by n (r+1) n (to see this write the ball as the union of spheres). In this way, n 2(r+1) n n .
Theorem 9.1 applied twice implies This bound is summable in n for any r ≥ 1 and t > 0 fixed because n → 0. Therefore, Borel-Cantelli Lemma finishes the proof.
To continue, we must compare the truncated network in this Lemma with the original N n . We first bound: which is an average of differences: Let r 0 be the constant of Lemma 7.1. From this Lemma, we have the bound: whenever r ≥ s + r 0 . Averaging over v ∈ [n], we conclude for all r ≥ s + r 0 . We know N n converges in the local weak sense to ν. The measure ν is supported on nice networks. That is, we have that ν(∪ +∞ a=1 B * a ) = 1 with  We bound the terms in the RHS of (9.6) according to whether |∂(N n , v) r | is either bounded by ae ar or not; in the latter case the terms are simply bounded by 2. We deduce that, EJP 25 (2020), paper 110.

Introduction
In this section, we explain how we performed the numerical simulations for the stochastic Kuramoto model on Galton-Watson (GW) trees. The equations are given by For this system, θ j (t) and ω j represent the angular phase and natural frequency of the oscillator indexed by j ∈ {1, 2, . . . , N }. The parameter K ∈ R + represents the coupling strength between nodes, and a ij = 1 if nodes i and j are connected or a ij = 0 otherwise. We assume that {W j t } t≥0 are independent brownian motions for each node j, while the noise intensity is given by ε > 0. In our simulations, we chose two different models for generating the GW trees: 1. Binomial model: The offspring is a binomial random variable with distribution Bin(n, p). In what follows, we fixed n = 3 and p = 2 3 for the Binomial model (mean offspring equals to 2) and C = 3 for the D-Regular model. We also considered the time interval t = [0, 100] (arbitrary units) with step ∆t = 0.01, and noise intensity ε = 0.05.

The synchronization level
We defined the synchronization level Sync = Sync(h, K) between the root and those nodes at a distance h , given the coupling strength K. If N denotes the total number of nodes and j = 1 is defined as the root index, we define the set D(h) = {j ∈ {2, ..., N }| node of index j is at distance h from the root} and the order parameter where # denotes set cardinality.
Our variable Sync is then defined as a time average of the last 5% values assumed by r h (t). More precisely, if we have a total of t n , the set of last 5% time indexes is given by J = { 0.95t n , ..., t n }. Therefore, we define Sync := j∈J r h (t j ) #J as our synchronization level parameter.

Numerical simulations
We present the details of our numerical simulations, considering the Binomial and D-Regular models. In both cases, our goal is to calculate an average synchronization parameter Sync between the root and those nodes at different distance values h, for distinct coupling strength (K) values. For all simulations, we considered GW trees with 13 generations (h ∈ {1, 2, · · · , 13}), and K ∈ [0, 10] with steps ∆K = 0.2. We performed all numerical solutions with the classic Euler-Maruyama scheme for stochastic differential equations. In what follows, we describe the step-by-step algorithms for each GW model.
• Binomial model. We produce a total of 100 simulations. In each of them, we generate a GW tree with N nodes and define the natural frequencies ω i , as well as the initial conditions θ i (0) (i ∈ 1, 2, . . . , N ). We then simulate the stochastic Kuramoto model with the Euler-Maruyama scheme, computing Sync(h, K) for the chosen h and K values (see above). Finally, we average the synchronization levels across the 10 simulations, obtaining Sync (h, K).
• D-Regular model. In this case, the GW is not random. Then we produce a total of 100 simulations, only re-sampling the initial conditions θ i (0) and natural frequencies ω i (i ∈ 1, 2, . . . , N ). For each simulation, we compute the synchronization level (Sync(h, K)) for the selected K and h values. We then compute the average synchronization level Sync (h, K) across the 100 simulations.
For both Binomial and D-Regular GW models, the average level of synchronization Sync significantly decreased, as the distance from the root h increased, for all considered choices of initial conditions θ i (0) and natural frequencies ω i . The parameter regions of high synchronization (red-colored in the heatmaps, which correspond to sync > 0.8)  were only found for low h values (h ≤ 4 in the binomial model and h ≤ 2 in the D-regular model). It is important to notice that the synchronization levels did not substantially increased even for higher values of the coupling strength K, which in our study was allowed to range from 0 to 10. This result indicates that desynchronization may be a predominant phenomenon in GW trees with the chosen average offspring values. From the GW model perspective, a simple visual inspection indicates that the Binomial model promoted a higher synchronization for lower h values than the D-Regular. For each one of the five combinations of the initial condition and natural frequencies, the comparison between models shows a significant higher synchronization in the binomial model. Further analysis would be required for us to understand the causes of such effect. However, we hypothesize that the variability in the offspring number that is present in the binomial model could be a major factor to explain this increase in synchronization.

A Weak local convergence and nice networks
The goal of this section is to prove Proposition 5.7, Theorem 5.8 and show that we have examples that satisfy the assumptions of Theorem 6.8 (cf. Remark 6.9).

A.1 Proof of Proposition 5.7
Proposition 5.7 is a simpler version of Theorem 6.7.
Start with a sequence of rooted graphs {(G k , o k )} k∈N that converges in the local topology to (G, o) ∈ G * . We want to show that the respective laws of the random rooted networks, (N G k , o k ) also converges to the law of (N G , o), that is, When we construct the random networks N G and N G k we can couple the marks to be equal in ( We have a sequence of graphs {G n } n∈N that converges locally weakly to the measure ρ ∈ G * . We want to show that defining the random networks N n := N Gn by adding the i.i.d. marks, we have that U (N n ) → ρM almost surely in P(N * ). We are assuming that max v∈[n] d Gn v ≤ n n with n → 0 as n → ∞. Let E [ · ] be the expectation in the space where the marks of the sequence of the networks (N n ) n∈N can be defined. We can easily check that In particular, from the continuity of M it is simple to deduce that E [U (N n )] → ρM.  Throughout the remainder of this section we have a bounded measurable test function The same arguments in Section A.1 says that that also holds in mean: that is a function of the independent variables (ω v , θ v (0)) v∈[n] and (µ e ) e∈En , where E n is the edge set of G n . If we change one of these marks on the vertex w, the only networks that are changed are those [N n (v), v] r with w ∈ (G n (v), v) r , that is, v ∈ (G n (w), w) r . Therefore, U (N n ) r (h) changes by at most the size of (G n (w), w) r over n. By assumption |(G n (w), w) r | ≤ n (r+1) n . Observe that U (N n )(h) is a function of 2n + E n independent random variables.
From Azuma inequality we have that To see that this bound is summable we use that the number of edges is one half of the sum of degrees to bound: Using the triangle inequality several times and the previous bounds we see that for any h with h BL ≤ 1.

A.3 Good examples
In this section we want to see that we have examples that satisfy the assumptions in Theorem 6.8.
From Theorem 5.8, the Assumption 1 in the Theorem 6.8 is satisfied almost surely for any of the graphs in Example 5.2, 5.3 and 5.4 when we add i.i.d. marks.
The Assumption 3 of Theorem 6.8 is satisfied trivially for graphs with bounded degree (Example 5.2). We impose in Example 5.4 that uniformly graphs with a given degree sequence also satisfy max v∈[n] d Gn v ≤ n n with n → 0 as n → ∞. d Gn v > n n ≤ nP d Gn 1 > n n ≤ exp (−n n log n n ) for n big enough. Since this bound is summable (we can impose n n → ∞ as n → ∞), a sequence of Erdös-Rényi graphs satisfy the assumption almost surely. It remains to see that in the Examples 5.2, 5.3 and 5.4, the local weak limit is supported in nice graphs, that is, the local weak limit of theses sequences are supported in rooted graphs that satisfy |∂(G, o) r | ≤ ae ar , for some a > 0.
It is clear that (A.2) is achieved when the local weak limit measure is supported in graphs with uniform bounded degree. So we are done with Example 5.2. Now we will check (A.2) for the GW tree. This will show that we are done with Examples 5.3 and 5.4 because a Unimodular GW tree is equal to a GW tree after the first generation.
Given a probability distribution P on N with mean µ ∈ (0, +∞) consider GW (P ) the GW distribution on the set of rooted trees. It means that in each generation each individual has i children with probability P (i) independent of the other individuals. Let Z n be the number of individuals at generation n, Z 0 = 1. We know that Z n /(µ) n is a martingale. In particular E [Z n ] = µ n . But we also have that |∂(G, o) n | = Z n if (G, o) has distribution GW (P ). Therefore, which is summable in r. Therefore, |∂(G, o) r | ≤ e (log 2µ)r for r big enough, almost surely. Choosing a big enough we see that |∂(G, o) r | ≤ ae ar for any r ≥ 1 and for almost all realizations of (G, o).

B Appendix -Continuity for finite graphs
The goal of this section is to justify (8.6): We think that this can be derived from the standard theory of Ordinary Differential Equations. For completeness we provide a prove in this section. Our notation follows that one in Section 8.1. We will use the following Lemma.
Lemma B.1 (Proof in Section B.1). Consider a graph G and two finite networks N i = (G, µ (i) , ω (i) , θ (i) (0)) ∈ N , i = 1, 2. Suppose that all marks are close by : Then there exists a constant C depending on φ BL , ψ Lip , T , and N such that almost surely with respect to the law of the Brownian motions  In that way, we use that the networks coincide at any radius and the bound in (B.1) to conclude that almost surely (cf. (5.1)) and the RHS of the last bound goes to zero as n → ∞. This is enough to finish.

B.1 Proof of Lemma B.1
In this section, we prove Lemma B.1. The prove follows similar ideas of the proof of Lemma 7.1.
For ease of notation we adopt the following conventions.
1. For the objects related to N 2 we omit the superscripts and just write µ, ω, θ(0) and θ. 2. For the objects related to N 1 we omit the superscript and write a over-line writing µ, ω, θ(0) and θ. 3. We also write P uv = µ uv /µ v and P uv = µ uv /µ v . 4. We write I for the identity matrix. 5. G has vertex set V and edge set E.
From Definition 6.1 and for v ∈ V , we have that and the analogous formula holds for θ v (·) using the objects of N 1 but using the same Brownian motion.
We suppose that the Brownian motions are coupled to be equal in each system. Proposition B.3. With our conventions, there exists a constant C depending on φ BL , ψ Lip , T , and 1/µ * such that Proof. From (B.2),  Now we estimate S v := u∈V |P uv − P uv |. Remember that sup vu∈E |µ vu − µ vu | ≤ . It is clear that S v = 0 if v is an isolated vertex. If v is not isolated, then The assumption that the network is nice says that µ * ≤ µ uv whenever uv ∈ E. That is, d v µ * ≤ µ v .
Combining these bounds we obtain the result.
We now use Corollary C.2 with 2. a(t) := (C ) v∈V , and each entry of this vector is non-negative.
3. M (t) = M := C(P + I), and this matrix does not depend on time t, it is entrywise non-negative, and it is finite dimensional since G is finite.
Therefore, for any v ∈ V |θ v (t) − θ v (t)| ≤ C exp tC(P + I) v To relate this bound with the Random Walk in (G, µ) observe that exp tC(P + I) = e Ct exp tCP = e 2Ct e −Ct exp tCP .
Notice that this bound is uniform in t ∈ [0, T ]. This is enough to finish the proof.

D Propagation of chaos
The goal of this section is to prove Corollary 6.12. As above, we will assume that the vertex set of N n is [n]. Let f 1 , . . . , f k : C * → R be bounded Lipschitz functions with f i BL ≤ 1. Our goal is to show that: νΘ(f i ) as n → +∞. This is true for k = 1, as shown in (9.1). Therefore, it suffices to show that: E U (N θ n )(f i ) → 0 as n → +∞. (D.1) The strategy to prove this goal is to use a similar argument as in Lemma 9.2 to exploit independence. For these reasons our first task is to compare EJP 25 (2020), paper 110.
We write the difference of products as a telescoping sum switching the terms in the product one by one. We may apply the bound in (9.8) for each term recalling that f i BL ≤ 1, we deduce that when r ≥ s ≥ r 0 for some constant r 0 , As before we can make, r → ∞, s → ∞ and a → ∞ in this order and it is clear that we can restate our goal: (D.2) To achieve this goal, we write: k i=1 E f i ([N n (u i ), u i ] θ r ) .
We deduce that the difference in (D.2) is the average of: over (u 1 , . . . , u k ) ∈ [n] k . Now fix a radius r ≥ 1. We split the set [n] k into two parts. Since the functions f i are also bounded by 1, we obtain: Recall from (D.3) that the difference we need to consider to achieve our goal (D.2) is an average over u 1 , . . . , u k ∈ [n] of the terms in the LHS of (D.4). This implies: The RHS of the display may be interpreted that, out of k uniformly random vertices u 1 , . . . , u k ∈ N n , at least some pair is at distance ≤ r from one another. This is the same as saying that there exists a u j ∈ (N n , u i ) 2r for some pair 1 ≤ i < j ≤ k. So: |(N n , u) 2r | n .
We prove the Claim in the end of this section. We deduce: for any k, r ≥ 1 fixed and the result follows.
Now we prove the Claim. In this way, taking the limit n → ∞ in the RHS of (D.5) we obtain that When we take M → ∞ the RHS of the previous display goes to 0 since ν is supported in networks with locally finite underlying graph. This is enough to finish.