Long-term stability of interacting Hawkes processes on random graphs

We consider a population of Hawkes processes modeling the activity of $N$ interacting neurons. The neurons are regularly positioned on the segment $[0,1]$, and the connectivity between neurons is given by a random possibly diluted and inhomogeneous graph where the probability of presence of each edge depends on the spatial position of its vertices through a spatial kernel. The main result of the paper concerns the longtime stability of the synaptic current of the population, as $N\to\infty$, in the subcritical regime in case the synaptic memory kernel is exponential, up to time horizons that are polynomial in $N$.


Hawkes processes in neuroscience.
In the present paper we study the large time behavior of a population of interacting and spiking neurons, as the size of the population N tends to infinity. We model the activity of a neuron by a point process where each point represents the time of a spike: Z N,i (t) counts the number of spikes during the time interval [0, t] of the ith neuron of the population. Its intensity at time t conditioned on the past [0, t) is given by λ N,i (t), in the sense that P (Z N,i jumps between(t, t + dt)|F t ) = λ N,i (t)dt, where F t := σ (Z N,i (s), s ≤ t, 1 ≤ i ≤ N ).
For the choice of λ N,i , we want to account for the dependence of the activity of a neuron on the past of the whole population : the spike of one neuron can trigger others spikes. Hawkes processes are then a natural choice to emphasize this interdependency. A generic choice is Here, with the ith neuron at position x i = i N ∈ I := [0, 1], f : R −→ R + represents the synaptic integration, µ(t, ·) : I −→ R + a spontaneous activity of the neuron at time t, v(t, ·) : I −→ R a past activity and h : R + −→ R a memory function which models how a past jump of the system affects the present intensity. The term w (N ) ij represents the random inhomogeneous interaction between neurons i and j, that will be modeled here in terms of the realization of a random graph.
Date: 2022/07/29. 1 Since the seminal works of [29,30], there has been a renewed interest in the use of Hawkes processes, especially in neuroscience. A common simplified framework is to consider an interaction on the complete graph, that is taking w (N ) ij = 1 in (1.1), as done in [22]. In this case, a very simple instance of (1.1) concerns the so called linear case, when f (x) = x,µ(t, x) = µ and v = 0, that is λ N,i (t) = λ N (t) = µ+ 1 N N j=1 t− 0 h(t−s)dZ N,j (s), with h ≥ 0 (see [22]). The biological evidence [11,36] of a spatial organisation of neurons in the brain has led to more elaborate Hawkes models with spatial interaction (see [43,25,14]), possibly including inhibition (see [40]). This would correspond in (1.1) to take w , A ∈ R, σ > 0. The macroscopic limit of the multivariate Hawkes process (1.1) is then given by a family of spatially extended inhomogeneous Poisson processes whose intensities (λ t (x)) x∈I solve the convolution equation (

1.2)
A crucial example is the exponential case, that is when h(t) = e −αt for some α > 0. In this case, the Hawkes process with intensity (1.1) is Markovian (see [25]). Denoting in (1.2) u t (x) := v t (x) + I W (x, y) t 0 h(t − s)λ s (y)dsdy as the potential of a neuron (the synaptic current) localised in x at time t (so that (1.2) becomes λ t (x) = f (u t (x))), an easy computation (see [14]) gives that, when v t (x) = e −αt v 0 (x) for some v 0 , u solves the Neural Field Equation (NFE) ∂u t (x) ∂t = −αu t (x) + I W (x, y)f (u t (y))dy + I t (x), (1.3) with source term I t (x) := I W (x, y)µ t (y)dy. Equation (1.3) has been extensively studied in the literature, mostly from a phenomenological perspective [44,2], and is an important example of macroscopic neural dynamics with non-local interactions (we refer to [12] for an extensive review on the subject). In a previous work [1], we give a microscopic interpretation of the macroscopic kernel W in terms of an inhomogeneous graph of interaction. We consider w 1≤i,j≤N is a collection of independent Bernoulli variables, with individual parameter W (x i , x j ): the probability that two neurons are connected depends on their spatial positions. The term κ i is a suitable local renormalisation parameter, to ensure that the interaction remains of order 1. This modeling constitutes a further difficulty in the analysis as we are no longer in a mean-field framework: contrary to the case w (N ) ij = 1, the interaction (1.1) is no longer a functional of the empirical measure of the particles (Z N,1 , · · · , Z N,N ). A recent interest has been shown to similar issues in the case of diffusions interacting on random graphs (first in the homogeneous Erdős-Rényi case [23,20,21,19], and secondly for inhomogenous random graph [34,3,5]).
A common motivation between [1] in the case of Hawkes processes and [34,3,5] in the case of diffusions is to understand how the inhomogeneity of the underlying graph may or may not influence the long time dynamics of the system. An issue common to all mean-field models (and their perturbations) is that there is, in general, no possibility to interchange the limits N → ∞ and t → ∞. More precisely, restricting to Hawkes processes, a usual propagation of chaos result (see [22,Theorem 8], [14,Theorem 1], [1,Theorem 3.10]) may be stated as follows: for fixed T > 0, there exists some C(T ) > 0 such that where Z i is a Poisson process with intensity (λ t (x i )) t≥0 defined in (1.2) suitably coupled to Z N,i , see the above references for details. Generically, C(T ) is of the form exp(CT ), such that (1.4) remains only relevant up to T ∼ c log N with c sufficiently small. In the pure mean-field linear case (w there is a well known phase transition [22,Theorems 10,11] when whereas when h 1 > 1 (supercritical case), λ t − −− → t→∞ ∞. This phase transition has been extended to the inhomogeneous case in [1]. In the subcritical case, one can actually improve (1.4) in the sense that C(T ) is now linear in T so that (1.4) remains relevant up to T = o( √ N ). A natural question is to ask if this approximation remains valid beyond this time scale. The purpose to the present work is to address this question: we show that, in the whole generality of (1.1), in the subcritical regime and exponential case (see details below), the macroscopic intensity (1.2) converges to a finite limit when t → ∞ and that the microscopic system remains close to this limit up to polynomial times in N .

Notation.
We denote by C parameters a constant C > 0 which only depends on the parameters inside the lower index. These constants can change from line to line or inside a same equation, we choose just to highlight the dependency they contain. When it is not relevant, we just write C. For any d ≥ 1, we denote by |x| and x · y the Euclidean norm and scalar product of elements x, y ∈ R d . For (E, A, µ) a measured space, for a function g in L p (E, µ) with p ≥ 1, we write g E,µ,p := E |g| p dµ 1 p . When p = 2, we denote by ·, · the Hermitian scalar product in L 2 (E). Without ambiguity, we may omit the subscript (E, µ) or µ. For a real-valued bounded function g on a space E, we write g ∞ := g E,∞ = sup x∈E |g(x)|.
For (E, d) a metric space, we denote by g L = sup x =y |g(x)−g(y)|/d(x, y) the Lipschitz seminorm of a real-valued function g on E. We denote by C(E, R) the space of continuous functions from E to R, and C b (E, R) the space of continuous bounded ones. For any T > 0, we denote by D ([0, T ], E) the space of càdlàg (right continuous with left limits) functions defined on [0, T ] and taking values in E. For any integer N ≥ 1, we denote by 1, N the set {1, · · · , N }. For any p ∈ [0, 1], B(p) denotes the Bernoulli distribution with parameter p.
1.3. The model. First, let us focus on the interaction between the particles. The graph of interaction for (1.1) is constructed as follows: Definition 1.1. On a common probability space Ω, F, P , we consider a family of ran- on Ω such that under P, for any N ≥ 1 and i, j ∈ 1, N , ξ (N ) is a collection of mutually independent Bernoulli random variables such that for 1 ≤ i, j ≤ N , ξ 5) with ρ N some dilution parameter and W : I 2 → [0, 1] a macroscopic interaction kernel. We assume that the particles in (1.1) are connected according to the oriented graph G N = {1, · · · , N } , ξ (N ) . For any i and j, ξ so that the interaction term remains of order 1 as N → ∞.
The class (1.5) of inhomogenous graphs falls into the framework of W -random graphs, see [33,9,10]. One distinguishes the dense case when lim N →∞ ρ N = ρ > 0 and the diluted case when ρ N → 0.
We now fix these sequences, and work on a filtered probability space Ω, F, (F t ) t≥0 , P rich enough for all the following processes can be defined. We denote by E the expectation under P and E the expectation w.r.t. P. In the following definitions, N is fixed and the particles are regularly located on the segment I = [0, 1]. We denote by x i = i N the position of the i-th neuron in the population of size N . We also divide I in N segments We can now formally define our process of interest.
Definition 1.2. Let (π i (ds, dz)) 1≤i≤N be a sequence of i.i.d. Poisson random measures on R + × R + with intensity measure dsdz. A (F t )-adapted multivariate counting process (Z N,1 (t) , ..., Z N,N (t)) t≥0 defined on Ω, F, (F t ) t≥0 , P is called a multivariate Hawkes process with the set of parameters N, F, ξ (N ) , W N , η, h if P-almost surely, for all t ≥ 0 and i ∈ 1, N : Our main focus is to study the quantity (X N,i ) 1≤i≤N defined in (1.9) as N → ∞, and more precisely the random profile defined for all x ∈ I by: (1.10) As N → ∞, an informal Law of Large Numbers (LLN) argument shows that the empirical mean in (1.8) becomes an expectation w.r.t. the candidate limit for Z N,i : we can replace the sum in (1.9) by the integral, the microscopic interaction term w (N ) ij in (1.8) by the macroscopic term W (x, y) (where y describes the macroscopic distribution of the positions), and the past activity of the neuron dZ N,j (s) by its intensity in large population. In other words, the macroscopic spatial profile will be described by where the macroscopic intensity of a neuron at position Such informal law of large number on a bounded time interval has been made rigorous under various settings, we refer for further references to [22,14] and more especially to [1] which exactly incorporates the present hypotheses.
Remark 1.3. In the expression (1.8) of the intensity λ N,i , X N,i given in (1.9) accounts for the stochastic influence of the other interacting neurons, whereas η t represents the deterministic part of the intensity λ N,i . Having in mind the generic example given in (1.1), a typical choice would correspond to taking d = 2 with η := (µ, v) and Once again, µ here corresponds to the spontaneous Poisson activity of the neuron and one may see v as a deterministic part in the evolution of the membrane potential of neuron i. Note that we generalize here slightly the framework considered in [14] in the sense that [14] considered (1.13) for µ ≡ 0 and v t (x) = e −αt v 0 (x) for some initial membrane potential v 0 (x). In the case of (1.13), one retrieves the expression of the macroscopic intensity λ t (x) given in (1.2). Typical choices of f in (1.13) are f (x) = x (the so-called linear model) or some sigmoïd function. Note that there will be an intrinsic mathematical difficulty in dealing with the linear case in this paper, as f is not bounded in this case. As already mentioned in the introduction, for the choice of h(t) = e −αt and v t (x) = e −αt v 0 (x), a straightforward calculation shows that u t (x) := v t (x) + X t (x) solves the scalar neural field equation (1.3) with source term I t (x) = I W (x, y)µ(t, y)dy.
We choose here to work with the generic expression (1.8) instead of (1.1) not only for conciseness of notation, but also to emphasize that the result does not intrinsically depend on the specific form of the function F .
Acknowledgements. This is a part of my PhD thesis. I would like to thank my PhD supervisors Eric Luçon and Ellen Saada for introducing this subject, for their useful advices and for their encouragement. This research has been conducted within the FP2M federation (CNRS FR 2036), and is supported by ANR-19-CE40-0024 (CHAllenges in MAthematical NEuroscience) and ANR-19-CE40-0023 (Project PERISTOCH).

Hypotheses.
Hypothesis 2.1. We assume that • F is Lipschitz continuous : there exists F L such that for any x, • F is non decreasing in the first variable, that is for any η ∈ R d , for any x, x ′ ∈ R such that x ≤ x ′ , one has F (x, η) ≤ F (x ′ , η). Moreover, we assume that F is C 2 on R d+1 with bounded derivatives. We denote by ∂ x F and ∂ 2 x F the partial derivatives of F w.r.t. x and (with some slight abuse of notation) ∂ η F = (∂ η k F ) k=1,...d as the gradient of F w.r.t. the variable η ∈ R d as well as ∂ 2 ..d the Hessian of F w.r.t. the variable η. • (η t (x)) t≥0,x∈I is uniformly bounded in (t, x). We also assume that there exists η ∞ Lipschitz continuous on I such that • The memory kernel h is nonnegative and integrable on [0, +∞).
• We assume that W : I 2 → [0, 1] is continuous. We refer nonetheless to Section 2.3.4 where we show that the results of the paper remain true under weaker hypotheses on W .
It has been showed in [1] that the process defined in (1.7) is well-posed, and that the large population limit intensity (1.12) is well defined in the following sense.
The same proofs work for our general case F . Proposition 2.3 also implies that the limiting spatial profile X t solving (1.11) is well defined.
Before writing our next hypothesis, we need to introduce the following integral operator.

Proposition 2.4. Under Hypothesis 2.1, the integral operator
is continuous in both cases H = L ∞ (I) and H = L 2 (I). When H = L 2 (I), T W is compact, its spectrum is the union of {0} and a discrete sequence of eigenvalues (µ n ) n≥1 such that µ n → 0 as n → ∞. Denote by r ∞ = r ∞ (T W ), respectively r 2 = r 2 (T W ) the spectral radii of T W in L ∞ (I) and L 2 (I) respectively. Moreover, we have that The proof can be found in Section 3.
Hypothesis 2.5. In the whole article, we are in the subcritical case defined by Note that in the complete mean-field case, W ≡ 1 and r ∞ = 1 so that one retrieves the usual subcritical condition as in [22]. In the linear case η = µ and F (x, η) = µ + x, (2.3) is exactly the subcritical condition stated in [1].
The aim of the paper is twofold: firstly, we state a general convergence result as t → ∞ of X t defined in (1.11) (or equivalently λ t in (1.12)), see Theorem 2.7. This result is valid for any general kernel h satisfying Hypothesis 2.1. Secondly, we address the long-term stability of the microscopic profile X N defined in (1.10), see Theorem 2.12. Contrary to the first one, this second result is stated for the particular choice of the exponential kernel h defined as h(t) = e −αt , with α > 0.
(2.4) The parameter α > 0 is often addressed as the leakage rate.The main advantage of this choice is that the process X N then becomes Markovian (see e.g. [25,Section 5]). This will turn out to be particularly helpful for the proof of Theorem 2.12. As already mentioned in the introduction, (2.4) is the natural framework where to observe the NFE (1.3) as a macroscopic limit, recall Remark 1.3. Note that in the exponential case (2.4), the subcritical case (2.3) reads ∂ x F ∞ r ∞ < α.
(2.5) For our second result (Theorem 2.12), we also need some hypotheses on the dilution of the graph. Recall the definition of ρ N in Definition 1.1.
Hypothesis 2.6. The dilution parameter ρ N ∈ [0, 1] satisfies the following dilution condition: there exists τ ∈ (0, 1 2 ) such that If one supposes further that F is bounded, we assume the weaker condition Remark. Hypothesis 2.6 is stronger than ∞, which is a dilution condition commonly met in the literature concerning LLN results on bounded time intervals for interacting particles on random graphs: it is the same as in [23,20] (and slightly stronger than the optimal N ρ N → +∞ obtained in [21] in the case of diffusions and as in [1] in the case of Hawkes processes).

2.2.
Main results. Our first result, Theorem 2.7, studies the limit as t → ∞ of the macroscopic profile X t (as an element of C(I)) defined in (1.11 Remark 2.8. Translating the result of Theorem 2.7 in terms of the macroscopic intensity λ t defined in (1.12) gives immediately that λ t converges uniformly to ℓ solution to The correspondence between X ∞ and ℓ (recall (1.11)) is simply given by X ∞ = h 1 T W ℓ.
Remark 2.9. In the particular case of an exponential memory kernel (2.5), as a straightforward consequence of the expression of X t in (1.11) and X ∞ in (2.8), we have the following differential equation (2.10) A simple Taylor expansion of X t around X ∞ shows that the linearised system associated to the nonlinear (2.10) is then where The subcritical condition (2.5) translates into the existence of a spectral gap for the linear dynamics (2.11), which makes the stationary point X ∞ linearly stable. More precisely, Proposition 2.10. Assume that the memory kernel h is exponential (2.5). Define the linear operator L : Then under Hypotheses 2.1 and 2.5, L generates a contraction semi-group on L 2 (I) e tL t≥0 such that for any g ∈ L 2 (I) e tL g 2 ≤ e −tγ g 2 , (2.14) where any T > 0. Here, we are more precise as we show uniform convergence of X N (t) in L 2 (I) instead of L 1 (I).
We are now in position to state the main result of the paper: the proximity stated in Proposition 2.11 is not only valid on a bounded time interval, but propagates to arbitrary polynomial times in N ρ N . Theorem 2.12. Choose some t f > 0 and m ≥ 1. Then, under Hypotheses 2.1, 2.6 and 2.5, P-a.s. for ε > 0 small enough, for some t ε > 0 independent on N .
Since F is Lipschitz and λ N,i (t) = F (X N,i (t−), η t (x i )) by (1.8), it is straightforward to derive from Theorem 2.12 a similar result for the profile of densities (2.18) Corollary 2.13. Recall the definition of ℓ in (2.9). Under the same set of hypotheses of Theorem 2.12 and with the same notation, 3. Examples and extensions. We give here some illustrating examples of our main results.

Mean-field framework.
To the best of the knowledge of the author, already in the simple homogeneous case of mean-field interaction, there exists no long-term stability result such as Theorem 2.12. We stress that our result may have an interest of its own in this case. Let us be more specific. When ρ N = W N = 1 and µ t (x) = µ ≥ 0, the process introduced in Definition 1.2 reduces to the usual mean-field framework [22]: In this simple case, the spatial framework is no longer useful (in particular the spatial profile defined in (1.10) is constant in x so that the L 2 framework is not relevant, one has only to work in R). The macroscopic intensity and synaptic current (respectively (1.12) and (1.11) become The main results of the paper translate then into Theorem 2.14. Under Hypothesis 2.1 and when ∂ Moreover, under the same hypotheses, in the exponential case (2.4), for any t f > 0 Remark 2.15. The previous result applies in particular to the linear case where η = µ and F (x, η) = µ + x. We have then that ℓ = µ 1 − h 1 in this case, as in [22].

Erdős-Rényi graphs.
An immediate extension of the last mean-field case concerns the case of homogeneous Erdős-Rényi graphs: choose W N (x, y) = ρ N for all x, y ∈ I. The results of our paper are valid under the dilution Hypothesis 2.6. It is however likely that these dilution conditions are not optimal (compare with the result of [19] with the condition N ρ N → ∞ in the diffusion case, but a difficulty here is that we deal with a multiplicative noise whereas it is essentially additive in [19]).

Examples in the inhomogeneous case.
As already mentionned in Hypothesis 2.1, the results are valid for any W continuous, interesting examples include W (x, y) = 1 − max(x, y), W (x, y) = 1 − xy, see [7,8]. Note also that we do not suppose any symmetry on W . Another rich class of examples concerns the Expected Degree Distribution model [17,38] where W (x, y) = f (x)g(y) for any continuous functions f and g on I.
The specificity of such class is that we have an explicit formulation of r ∞ , that is r ∞ = In the linear case, we obtain an explicit formula for λ t in [1, Example 4.3].

2.3.4.
Extensions. It is apparent from the proofs below that one can weaken the hypothesis of continuity of W . Under the hypothesis that W is bounded, Proposition 2.3 remains , L ∞ (I)) (continuity of λ t and X t in x may not be satisfied). Supposing further that there exists a partition of I into p intervals I = ⊔ k=1,··· ,p C k such that for all ǫ > 0, there exists η > 0 such that and 0, see Lemmas 6.4, 6.5 and 6.6.
These particular conditions are met in the following cases (details of the computation are left to the reader) • P-nearest neighbor model [37]: W (x, y) = 1 d S 1 (x,y)<r for any (x, y) ∈ I 2 for some fixed r ∈ (0, 1 2 ), with d S 1 (x, y) = min(|x − y|, 1 − |x − y|). • Stochastic block model [32,25]: it corresponds to considering p communities (C k ) 1≤k≤p . An element of the community C l communicates with an element of the community C k with probability p kl . This corresponds to the choice of interaction kernel W (x, y) = k,l p kl 1 x∈C k ,y∈C l .

2.4.
Link with the literature. Several previous works have complemented the propagation of chaos result mentioned in (1.4) in various situations: Central Limit Theorems (CLT) have been obtained in [22,25] for homogeneous mean-field Hawkes processes (when both time and N go to infinity) or with age-dependence in [13]. One should also mention the functional fluctuation result recently obtained in [31], also in a pure mean-field setting. A result closer to our case with spatial extension is [16], where a functional CLT is obtained for the spatial profile X N around its limit. Some insights of the necessity of considering stochastic versions of the NFE (1.3) as second order approximations of the spatial profile are in particular given in [16]. Note here that all of these works provide approximation results of quantities such that λ N or X N that are either valid on a bounded time interval [0, T ] or under strict growth condition on T (see in particular the condition T N → 0 for the CLT in [25]), whereas we are here concerned with time-scales that grow polynomially with N .
The analysis of mean-field interacting processes on long time scales has a significant history in the case of interacting diffusions. The important issue of uniform propagation of chaos has been especially studied mostly in reversible situations (see e.g. the case of granular media equation [6]) but also more recently in some irreversible situations (see [18]). There has been in particular a growing interest in the long-time analysis of phase oscillators (see [28] and references therein for a comprehensive review on the subject). We do not aim here to be exhaustive, but as the techniques used in this work present some formal similarities, let us nonetheless comment on the analysis of the simplest situation, i.e. the Kuramoto model. One is here interested in the longtime behavior of the empirical measure µ N,t := 1 The simplicity of the Kuramoto model lies in the fact that one can easily prove the existence of a phase transition for this model: when K ≤ 1, µ ≡ 1 2π is the only (stable) stationary point of the previous NFP (subcritical case), whereas it coexists with a stable circle of synchronised profiles when K > 1 (supercritical case). A series of papers have analysed the longtime behavior of the empirical measure µ N of the Kuramoto model (and extensions) in both the subcritical and supercritical cases (see in particular [4,35,27,19] and references therein). The main arguments of the mentioned papers lie in a careful analysis of two contradictory phenomena that arise on a long-time scale: the stability of the deterministic dynamics around stationary points (that forces µ N to remain in a small neighborhood of these points) and the presence of noise in the microscopic system (which makes µ N diffuse around these points). In particular, the work that is somehow formally closest to the present article is [19], where the long-time stability of µ N is analysed in both sub and supercritical cases for Kuramoto oscillators interacting on an Erdős-Rényi graph. We are here (at least formally) in a similar situation to the subcritical case of [19]: the deterministic dynamics of the spatial profile X N (given by (1.10)) has a unique stationary point which possesses sufficient stability properties. The point of the analysis relies then on a time discretization and some careful control on the diffusive influence of noise that competes with the deterministic dynamics. The main difference (and present difficulty in the analysis) with the diffusion case in [19] is that our noise (Poissonnian rather than Brownian) is multiplicative (whereas it is essentially additive in [19]). This explains in particular the stronger dilution conditions that we require in Hypothesis 2.6 (compared to the optimal N ρ N → ∞ in [19]) and also the fact that we only reach polynomial time scales (compared to the sub-exponential scale in [19]). There is however every reason to believe that the stability result of Theorem 2.12 would remain valid up to this sub-exponential time scale.
Note here that we deal directly with the control of the Poisson noise. Another possibility would have been to use some Brownian approximation of the dynamics of X N . Some results in this direction have been initiated in [25] for spatially-extended Hawkes processes exhibiting oscillatory behaviors: some diffusive approximation of the dynamics of the (equivalent of) the spatial profile is provided (see [25,Section 5]). Note however that this approximation is based on the comparison of the corresponding semigroups and is not uniform in time. Hence, it is unclear how one could exploit these techniques for our case. Some stronger (pathwise) approximations between Hawkes and Brownian dynamics have been further proposed in [15], based on Komlós, Major and Tusnády (KMT) coupling techniques ( [26], see also [41] for similar techniques applied to finite dimensional Markov chains). However, this approximation is again not uniform in time so that applying this coupling to our present case is unclear. Our proof is more direct and does not rely on such Brownian coupling. To the author's knowledge, this is the first result on large time stability of Hawkes process (not mentioning the issue of the random graph of interaction, we believe that our results remain also relevant in the pure mean-field case, see Theorem 2.14).
2.5. Strategy of proof and organization of the paper. Section 3 is devoted to prove the convergence result as t → ∞ of Theorem 2.7. This in particular requires some spectral estimates on the operator L defined in Proposition 2.10 that are gathered in Section 3.1.
The main lines of proof for Theorem 2.12 are given in Section 4.The strategy of proof is sketched here: (1) The starting point of the analysis is a semimartingale decomposition of Y N := X N − X, detailed in Section 4.1. The point is to decompose the dynamics of Y N in terms of, at first order, the linear dynamics (2.11) governing the behavior of the deterministic profile X, modulo some drift terms coming from the graph and its mean-field approximation, some noise term and finally some quadratic remaining error coming from the nonlinearity of F . (2) A careful control on each of these terms in the semimartingale expansion on a bounded time interval are given in the remaining of Section 4.1. The proof of these estimates are respectively given in Section 5 (for the noise term) and Section 6 (for the drift term). (3) The rest of Section 4 is devoted to the proof of Theorem 2.12, see Section 4.2. The first point is that for any given ε > 0, one has to wait a deterministic time t ε > 0, so that the deterministic profile X t reaches an ε-neighborhood of X ∞ . It is easy to see from the spectral gap estimate (2.14) that this t ε is actually of order − log(ε) γ . Then, using Proposition 2.11, the microscopic process X N is itself ε-close to X ∞ with high-probability. (4) The previous argument is the starting point of an iterative procedure that works as follows: the point is to see that provided X N is initially close to X ∞ , it will remain close to X ∞ on some [0, T ] for some sufficiently large deterministic T > 0.
The key argument is that on a bounded time interval, the deterministic linear dynamics dominates upon the contribution of the noise, so that one has only to wait some sufficiently large T so that the deterministic dynamics prevails upon the other contributions. (5) The rest of the proof consists in an iterative procedure from the previous argument, taking advantage of the Markovian structure of the dynamics of X N . The time horizon at which one can pursue this recursion is controlled by moment estimates on the noise, proven in Section 5.
The rest of the paper is organised as follows: Section 7 collects the proofs for the finite time behavior of Proposition 2.11 whereas some technical estimates are gathered in the appendix.
3. Asymptotic behavior of (X t ) This section is related to the proof of Theorem 2.7.
3.1. Estimates on the operator L.
Proof of Proposition 2.4. The continuity and compactness of T W come from the boundedness of W . The structure of the spectrum of T W is a consequence of the spectral theorem for compact operators. The equality between the spectral radii is postponed to Lemma A.8 where a more general result is stated (see also Proposition 4.7 of [1] for a similar result).
Proof of Proposition 2.10. Let us introduce the operator we have then L = −αId + U . By Hypothesis 2.1, G is bounded. Then, for any g ∈ L 2 (I) using Cauchy-Schwarz inequality, The operator U is then compact and thus has a discrete spectrum. Moreover, Then L also has a discrete spectrum, which is the same as U but shifted by α. Since The estimate (2.14) follows then from functional analysis (see e.g. Theorem 3.1 of [39]).

About the large time behavior of X t .
Proof of Theorem 2.7. We prove that • there exists a unique function ℓ : I → R + solution of (2.9), continuous and bounded on I, and that • (λ t ) t≥0 converges uniformly when t → ∞ towards ℓ.
It gives then that X ∞ := h 1 T W ℓ is the unique solution of (2.8). Then, as X t (x) = I W (x, y) t 0 h(t − s)λ s (y)ds dy, as (λ t ) is uniformly bounded, and as h is integrable and λ t → ℓ uniformly, we conclude by dominated convergence that uniformly on y, t 0 h(t − s)λ s (y)ds − −− → t→∞ h 1 ℓ(y). As T W is continuous, the result follows: X t converges uniformly towards X ∞ . We show first that (λ t ) is uniformly bounded. Let λ t (x) = sup s∈[0,t] λ s (x), we have then with (1.12), for s ∈ [0, t] , so that, by (2.3) and choosing n 0 sufficiently large such that ∂ x F n 0 ∞ h n 0 1 T W n 0 < 1, we obtain that λ t ∞ < C, where C is independent of t. Passing to the limit as t → ∞, this implies that (λ t ) t>0 is then uniformly bounded, i.e. sup t≥0 sup x∈I |λ t (x)| =: λ ∞ < ∞.
We show next that (λ t ) converges pointwise. We start by studying the supremum limit of λ t , denoted by ℓ(x) := lim sup t→∞ λ t (x) = inf r>0 sup t>r λ t (x) =: inf r>0 Λ(r, x). Then for any r > 0 and t > r: Note that ℓ ∞ ≤ λ ∞ < ∞, by the first step of this proof. Denote in a same way ℓ(x) := lim inf t→∞ λ t (x) = sup r>0 inf t>r λ t (x) =: sup r>0 v(r, x), for any t > 0 we have by monotonocity of F in the first variable: Taking lim inf t→∞ on both sides, by monotone convergence, we obtain For any l and l ′ in L ∞ (I) and any x ∈ I, we have

By iteration we show that
so that, choosing again n 0 sufficiently large, H n 0 is a contraction mapping by (2.3). Hence, by (3.4), one has necessarily that for all x ∈ I ℓ(x) = ℓ(x) < +∞ thus (λ t ) converges pointwise towards ℓ = ℓ = ℓ the unique fixed point of H which satisfies (2.9).
We show now that the family (λ t ) t≥0 is equicontinuous so that the pointwise convergence will imply uniform convergence on the compact set I. For any (x, y) ∈ I and t ≥ 0, we have With (2.1), we have and as λ is bounded, we have (3.5) for any t ≥ T , and as W is uniformly continuous on I 2 , one can find η > 0 such that C F,λ,h,W |x − y| + I |W (x, z) − W (y, z)| dz ≤ ε 2 when |x − y| ≤ η. We can divide [0, 1] in intervals [z k , z k+1 ] such that for any k, z k+1 − z k ≤ η. Then, for any x ∈ [0, 1], one can find z k such that |z k − x| ≤ η, and which gives, as W is uniformly continous, the continuity of X ∞ .

Large time behavior of U N (t)
The aim of the present section is to prove Theorem 2.12. To study the behavior of where: and φ N is the drift term and ζ N is the noise term coming from the jumps of the process X N .
Proof of Proposition 4.1. From (1.9) and (1.10), we obtain that X N verifies The centered noise M N defined in (4.7) verifies and is a martingale in L 2 (I). Thus recalling the definition of X ∞ in (2.8) and by inserting the term N i=1 N j=1 (4.9) A Taylor's expansion gives with g N given in (4.5). Hence, we have with G defined in (2.12) hence coming back to (4.9) and recognizing the operator L (2.13) We recognize r N defined in (4.4), and obtain exactly dY N (t) = LY N (t)dt + r N (t)dt + dM N (t). (4.10) Then the mild formulation (4.2) is a direct consequence of Lemma 3.2 of [45]: the unique strong solution to (4.10) is indeed given by (4.2).
The proof is postponed to Section 5.

Proposition 4.3 (Drift term). Under Hypothesis 2.1, for any
is an explicit quantity to be found in the proof that tends to 0 as N → ∞.
The proof is postponed to Section 6.

4.2.
Proof of the large time behaviour. We prove here Theorem 2.12, based on the results of Section 4.1. The approach followed is somehow formally similar to the strategy of proof developed in [19] for the diffusion case.
Proof of Theorem 2.12. Choose m ≥ 1 and t f > 0. Let where γ is defined in (2.15) and the constant C drif t comes from Proposition 4.3 above.
We consider ε small enough, such that ε < ε 0 . As (X t ) converges uniformly towards X ∞ (Theorem 2.7), there exists t 1 ε < ∞ such that Moreover, with (2.1), we also have that (4.14) We set now t ε = max(t 1 ε , t 2 ε ). Let T such that The strategy of proof relies on the following time discretisation. The point is to control which will imply the result (2.17) as [t ε , (N ρ N ) m t f ] ⊂ [t ε , T N ] since T > t f . We decompose below the interval [t ε , T N ] into a N intervals of length T . We define the following events, By (4.13), and as Proposition 2.11 gives that P sup (4.20) Step 1. We have from the definition (4. (4.21) Moreover, Recall that we are in the exponential case (2.4), so that (X N (t)) t is a Markov process. Thus by Markov property ) means that, under an initial condition at t ε + (a N − 1)T , we look at the probability that Y N stays under ε on the interval [t ε + (a N − 1)T, t ε + a N T ] of size T and comes back under ε 2 at the final time t ε + a N T . By Markov's property, it is exactly P E(t ε , t ε + T )|A N 1 (ε) . An immediate iteration gives then By (4.20), from now on we consider that we are on this event A N 1 (ε) and omit this notation for simplicity.
Step 2. We show that Let us place ourselves in A N 2 (ε). As we are also under for the first condition of E(t ε , t ε + T ). As Y N verifies (4.1), it can be written for where the first inequality comes from Proposition 4.3, and the second is true for N large enough using G N → 0 and (4.14). Coming back to (4.24), using that by Proposition 2.10 and using (4.25), we have on Let δ > 0 such that δ ≤ min ε 6 , γ 9C drift . Recall that Y N (·) 2 is not a continuous function, it jumps whenever a spike of the process (Z N,1 , · · · , Z N,N ) occurs, but the size jump never exceeds 1 N , and for N large enough 1 N ≤ δ 2 . Then, one can apply Lemma A.9 and obtain that for all N large enough, It remains to prove that Y N (t ε + T ) 2 ≤ ε 2 . We obtain from (4.24), (4.25) and (4.26) for Using the a priori bound (4.27) where we recall the particular choices of T and ε < ε 0 in (4.15) and (4.12). This concludes the proof of (4.23).
Step 3. We obtain with (4.22) and Markov's inequality, where we have taken m ′ > m. With Proposition 4.2, it gives By definition (4.16), a N = o (N ρ N ) m ′ , the right term tends to 1 as N goes to ∞ under Hypothesis 2.6. By (4.21), we conclude that This concludes the proof of Theorem 2.12.

Proofs -Noise perturbation
In this section, we prove Proposition 4.2 concerning the control of the noise perturbation ζ N (t 0 , t) defined in (4.6). For simplicity of notation, we assume that t 0 = 0. Recall the expression of (Z N,j ) 1≤j≤N in (1.7). Introduce the compensated measureπ j (ds, dz) := π j (ds, dz) − λ N,j dsdz, so that with the linearity of (e tL ) t≥0 , we obtain that ζ N can be written as where we used Jensen's inequality and Burkholder-Davis-Gundy Inequality on the mar- We focus now on the term E Let S i := N j=1 w ij N . By (A.2), we have that P-almost surely, lim sup N →∞ sup 1≤i≤N S i ≤ 2. We obtain with discrete Jensen's inequality that for any t ≥ 0  We obtain then With (A.2), it gives that, P-almost surely for N large enough As for any t ≥ 0 Grönwall's lemma gives that sup N ) and similarly an immediate iteration gives that for any k ≥ 0, sup which concludes the proof.

Proof of Proposition 4.2.
Proof. We divide the proof in different steps. Fix m ≥ 1. We prove Proposition 4.2 for the choice t 0 = 0, but it remains the same for a general initial time t 0 ≥ 0.
Step 1 -The functional φ : L 2 (I) → R given by φ(v) = v 2m 2 is of class C 2 (recall that ζ N ∈ L 2 (I)) so that by Itô formula on the expression (5.1) we obtain We have then for any v, h, k ∈ L 2 (I), Re v, k Re v, h + 2m v 2m−2 Re h, k .
Let ε > 0 to be chosen later. From Young's inequality, for any a, b ≥ 0, we can write This gives for the choice a = E sup 0≤s≤t ζ N (s) 2m 2 2m−1 2m We have then shown that, for the constant C given by Burkholder-Davis-Gundy Inequality, Let us focus now on I 2 in (5.3): For any jump (s, z) of the Poisson measure π j , from Taylor's Lagrange formula there exists τ s ∈ (0, 1) such that (s, z), χ j (s, z)) .
Re v, k Re v, h +2m v 2m−2 Re h, k for any v, h, k ∈ L 2 (I), one has with Cauchy-Schwarz inequality that But as x + τ y 2 2 ≤ max x 2 2 , x + y 2 2 for any x, y ∈ L 2 (I) and τ ∈ (0, 1), we have here We proceed now similarly as for I 1 . From Hölder inequality, as 2m−2 2m + 1 m = 1 we know that for any A, B random non-negative variables, χ j (s, z) 2 2 π j (ds, dz) to E sup s≤t |I 2 (s)| equals to With the same ε introduced for I 1 , from Young's inequality, for any a, b ≥ 0, we can write it gives that E sup s≤t |I 2 (s)| is upper bounded by Taking the expectation in (5.3) and combining (5.4) and (5.5), we obtain that Step 4 -We can now fix ε such that ε (C(2m − 1) + m(2m − 2)) ≤ 1 2 so that (5.6) leads to where C > 0 depends only on m.
Step 5 - We have which leads to, with the definition of Z N,j in (1.7) With (A.2), Jensen's discrete inequality and (5.7), it leads to hence the result with Proposition 5.1.

Proofs -Drift term
In this section, we prove Proposition 4.3 concerning the control of the drift term perturbation φ N (t 0 , t) defined in (4.3). 6.1. Notation. We introduce the following auxiliary functions in L 2 (I): From the expression of r N in (4.4), we have then and we can divide Proof. A direct computation gives, for any s ≥ 0, By definition of X N (s) in (1.10), X N = X N,j on B N,j hence using Theorem 2.7 Then (6.7) is a straightforward consequence of the uniform continuity of X ∞ on the compact I (see Theorem 2.7). It still holds under the hypotheses of Section 2.3.4 by decomposing the sum on each interval C k .
We will often use where R W N,k and S W N are respectively defined in (2.24) and (2.25). Proof. Fix ε > 0. As W is uniformly continuous on I, there exists η > 0 such that |W (x, y) − W (x, z)| ≤ ǫ for any (x, y, z) ∈ I 3 with |y − z| ≤ η. Then, for N large enough (such that 1 N ≤ η, we have directly that R W N,1 ≤ ǫ and R W N,2 ≤ ǫ hence the result. We can do the same for S W N . Lemma 6.3. Under Hypothesis 2.1, for any t > t 0 ≥ 0, Proof of Lemma 6.3. By Proposition 2.10 we have φ N,0 (t) 2 ≤ t t 0 e −γ(t−s) T W g N (s) 2 ds. As for any x ∈ I, |T W g N (s)(x)| ≤ I W (x, y) |g N (s)(y)| dy, and as as W is bounded. We obtain then, as Then (6.10) follows as Y N (s) 2 ≤ 1 2 1 + Y N (s) 2 2 and sup s δ s < ∞.
Lemma 6.4. Under Hypotheses 2.1 and 2.6, P-almost surely for N large enough and for any t > t 0 ≥ 0, 11) where G N,1 = G N,1 (ξ) is explicit in N and tends to 0 as N → ∞. Moreover, if we suppose F bounded, we have a better bound Proof of Lemma 6.4. Proposition 2.10 gives that where we have used the notation Forgetting about the term F (X N,j (s), η s (x j )) in (6.14), γ N is essentially an empirical mean of the independent centered variables ξ ij and thus should be small as N → ∞. One difficulty here is that concentration bounds (e.g. Bernstein inequality) for weighted sums such as j ξ ij u i,j (for some deterministic fixed weight u i,j ) are not directly applicable, as u i,j = F (X N,j (s), η s (x j ))1 B N,i depends in a highly nontrivial way on the variables i,j themselves. A strategy would be to use Grothendieck inequality (see Theorem A.1).
We refer here to [19,21] where the use of such Grothendieck inequality (and extensions) has been implemented in a similar context of interacting diffusions on random graphs. However here, a supplementary difficulty lies in the fact that F need not be bounded (recall that a particular example considered here concerns the linear case where F (x, η) = x + µ). Hence the application of Grothendieck inequality is not straightforward when F is unbounded. For this reason, we give below two different controls on γ N : a general one, without assuming that F and a second (sharper) one, when F is bounded (using Grothendieck inequality). In the first case, we get around the difficulty of unboundedness of F by introducing F (X ∞ (x j ), η ∞ (x j )) which is bounded, since X ∞ is. First begin with the general control on γ N : we can write Denoting by ∆F j := F (X N,j (s), η s (x j ))−F (X ∞ (x j ), η ∞ (x j )), we have, as As |∆F j ≤ F L (|Y N (s)(x j )| + δ s ), we obtain as s → δ s is bounded hence as Y N (s) 2 ≤ 1 2 1 + Y N (s) 2 2 and using (6.8), (6.17) For the second term of (6.16), we have and F k = σ (ξ ij , 1 ≤ i, j ≤ k) (with respect to P, i.e. the realisation of the graphs). We We show next that (R N ) is a martingale: for any k ≥ 1 (note that R 1 = 0): as E ξ ij ξ ij ′ |F k = 0 if j = j ′ and at least one of the indexes i, j, j ′ is equal to k + 1 by independence of the family of random variables (ξ ij ) i,j . Similarly, we have As each |ξ i,j | ≤ 1 and |α i,j,k | ≤ 1, it gives |∆R k | ≤ 3k 2 + k. Theorem A.2 gives then that with P (N ) = 2N (N + 1) Coming back to (6.16), combining (6.17) and (6.18) and a control of S max N from Lemma A.5, we have P-a.s. for N large enough hence taking the square root and using (6.13), → 0 under Hypothesis 2.6.
Let us now turn to the sharper control on γ N defined in (6.14) when F is bounded. Coming back to (6.16), we have Grothendieck inequality (see Theorem A.1) gives then that there exists K > 0 such that Fix some vectors of signs s = (s i ) 1≤i≤N and t = (t j ) 1≤j≤N . Let A = ξ ij 1≤i,j≤N , then N i,j,k=1 ξ ij ξ ik s j t k = t, A * As where , denotes the scalar product in R N and A * the transpose of A. As for any sign vector t, t 2 = N k=1 t 2 k = N , and A * A = A 2 op , we obtain as | t, A * As | ≤ t A * As ≤ N A 2 op : From Theorem A.3, there exist C a and C b positive constants such that for any x ≥ C a ,  19) where G N,2 is explicit in N and tends to 0 as N → ∞. Moreover, if we suppose F bounded, we have We upper-bound each term. We have as F is Lipschitz continuous which is upper-bounded by We have then where R W N,1 is defined in (2.24). For the second term, we have as where R W N,2 is defined in (2.24). For the third term, as F is Lipschitz continuous We obtain then with (6.8) With (6.6) and Proposition 2.10, φ N,  21) where G N,3 is explicit in N and tends to 0 as N → ∞. Moreover, if we suppose F bounded, we have Proof of Lemma 6.6. We have with Cauchy Schwarz's inequality. We can recognize S W N defined in (2.25), and we have that, as F is Lipschitz continuous and x → F (X ∞ (y), η ∞ (y)) is bounded, I F (X N (s, y), η s (y)) 2 dy ≤ I (F (X N (s, y), η s (y)) − F (X ∞ (y), η s (y))) 2 dy + I F (X ∞ (y), η s (y)) 2 dy As before, (6.6) and Proposition 2.10 give that φ N, 4 2 + 1 hence the result with G N,3 = C F,W R N,3 and Lemma 6.2. When F is bounded, we directly have 6.3. Proof of Proposition 4.3. Proposition 4.3 is then a direct consequence of (6.5) and (6.6), of the controls given by Lemmas 6.3, 6.4, 6.5 and 6.6, with G N = G N,1 +G N,2 +G N,3 , and of Lemma 6.2 to have G N → 0.

About the finite time behavior
In this section, we prove Proposition 2.11. 7.1. Main technical results. In the following, we denote by Y N (t) := X N (t) − X t .
Proof of Proposition 2.11. Let t ≤ T . Recall the definition of X N (t) in (1.10) and X t in (1.11). Proceeding exactly as in the proof of Proposition 4.1, and recalling the definition of M N (t) in (4.7), we have with the notations introduced in (6.1) -(6.3). It gives then, as Y N (0) = 0, Note that we obtain a similar expression as for Y N in Proposition 4.1, but with e −αt instead of the semi-group e tL . We use then the two following results, similar to Propositions 4.2 and 4.3.
where G N is an explicit quantity to be found in the proof that tends to 0 as N → ∞.
Their proofs are postponed to the following subsection. Hence we obtain ds which is still non-positive. About I 1 (t) and I 2 (t), the proof remains the same aside from the fact that we now consider ζ N instead of ζ N .
To prove 7.2, we introduce an auxilliary quantity as in Lemma 6.1.
Proof. It plays the role ofỸ N (s) introduced in Lemma 6.1. Similarly at what has been done before, we have Proof of Proposition 7.2. We divide φ as in (6.6) and study each contribution. About φ N,0 (t) := t 0 e −α(t−s) T W (F (X N (s), η s ) − F (X s , η s )) ds, we have Θ s,i,1 N 1 B N,i ds, we do as in Lemma 6.4. Instead of inserting the terms F (X ∞ (x j ), η ∞ (x j )) in (6.16) we insert the terms F (X s (x j ), η s (x j )), that is γ N (s) ≤ as sup t∈[0,T ],x∈I F (X t (x), η t (x)) < ∞ and obtain that P-almost surely if N is large enough, We have then that, P-almost surely if N is large enough, Θ s,i,k N 1 B N,i ds for k ∈ {2, 3}, we proceed similarly, doing as in Lemmas 6.5 and 6.6 but instead of inserting the terms F (X ∞ (x j ), η ∞ (x j )) we insert the terms F (X s (x j ), η s (x j )): then there is no δ s terms. We obtain then where both G N,2 and G N,3 tends to 0. Note that we can obtain better bounds when F is bounded. By putting all the terms φ N,k together, we get (7.1).

Appendix A. Auxiliary results
A.1. Concentration results.
Then, there exists a constant K R > 0, such that for every Hilbert space (H, ·, · H ) and for all S i and T j in the unit ball of H for any x ≥ C.
Proof. It is a direct consequence of Corollary 8.2 of a previous work [1], in the case w N = ρ N , where τ ∈ (0, 1 2 ) comes from Hypothesis 2.6. Proof. When j and j ′ are fixed and j = j ′ , X i := ξ ij ξ ij ′ 1≤i≤N is a family of independent random variables with |X i | ≤ 1, E[X i ] = 0 and E[X 2 i ] ≤ 1. Bernstein's inequality gives then for any t > 0 We apply then Borel Cantelli's lemma and obtain (A.3).
A.2. Other technical results.
Lemma A.8. Let K be a kernel from I 2 → R + such that sup x∈I I K(x, y) 2 dy < ∞. Let T K : g → T K g := x → I K(x, y)dy be the operator associated to K, that can be defined from L 2 (I) → L 2 (I) and from L ∞ (I) → L ∞ (I). We assume that T 2 K : L 2 (I) → L 2 (I) is compact. Then r 2 (T K ) = r ∞ (T K ).
Lemma A.9 (Quadratic Grönwall's lemma). Let f be a non-negative function piecewise continuous with finite number of distinct jumps of size inferior to θ on [t 0 , T ], let g be a