Large deviations for interacting particle systems: joint mean-field and small-noise limit

We consider a system of stochastic interacting particles in $\mathbb{R}^d$ and we describe large deviations asymptotics in a joint mean-field and small-noise limit. Precisely, a large deviations principle (LDP) is established for the empirical measure and the stochastic current, as the number of particles tends to infinity and the noise vanishes, simultaneously. We give a direct proof of the LDP using tilting and subsequently exploiting the link between entropy and large deviations. To this aim we employ consistency of suitable deterministic control problems associated to the stochastic dynamics.


Introduction
In statistical mechanics, macroscopic properties of a physical system are usually derived from a probabilistic description of complicated interactions at a microscopic level. Generally, the macroscopic behaviour is provided by means of a deterministic partial differential equation, also known as hydrodynamical equation. At a microscopic scale instead, the dynamic can be described via a stochastic interacting particles model whose choice is then fundamental to get a rigorous derivation of the above macroscopic equation. A step further in the study of (out of equilibrium) systems consists in understanding whether it is possible, and how large is the probability, to observe a different macroscopic behaviour from the one predicted by hydrodynamics. To answer this question it is quite natural to look for a large deviations principle (LDP in short), for which some fluctuations around the equilibria of the quantities involved are also captured.
Within this framework, a laboratory but rich enough example to investigate is the one proposed by McKean in the context of propagation of chaos, see e.g. [29]. Given a bunch of particles randomly moving in the whole space R d , we prescribe their evolution with a system of Itô-type SDEs with independent Brownian noises. The interaction between the (exchangeable) particles is required to be of mean field type, i.e. each particle depends on the current empirical distribution of the system, and the coefficients of all the equations have the same functional form. Here, the relevant physical quantity to deal with is the particles density, and it has been proved in several situations that the associated empirical measure gives rise, after a proper rescaling, to a macroscopic density solving a Vlasov-type equation. The mean field character of the interaction is fundamental in this procedure: it guarantees that the contribution of any given particle to the empirical distribution is small when a sufficiently large number of particles is considered. Also, from a different perspective, the limit PDE can be thought as a model simplification of the N -particles system and can be used to investigate properties of the microscopic system as the number of particles is very large.
The present paper is an attempt to clarify relations among the various descriptions of mean field systems in R d , focusing on the micro/macro and deterministic/stochastic dualities. A rough picture of the problem is given in the following diagram: In one corner, from a microscopic-stochastic point of view, the system is modelled through N stochastic equations with interaction (SDE N ), as briefly outlined above. A counterpart of this description is given on the opposite side in a microscopic-deterministic fashion, where N deterministic differential equations (ODE N ) govern the dynamic. The relation between the two pictures is a well studied topic in the context of Friedlin-Wentcell theory of random perturbation of dynamical systems and it is represented in the above diagram by the arrow with vanishing noise (ε ↓ 0).
As the number of particles increases (N ↑ +∞), in the lower left corner we deal with a macroscopic limit process (McKean-Vlasov) also referred to as nonlinear diffusion. Here we have to take into account both the limit behaviour of a typical particle and the limit of the empirical distribution. In fact, the nonlinear character of the diffusion originates from the fact that the dynamic of a typical particle depends on the particle distribution itself.
For what concerns the macroscopic characterisation on the right hand side (Vlasov PDE) we have at our disposal at least two different approaches. On one hand, starting from the microscopic-deterministic model (ODE N ) and sending N ↑ +∞, we obtain a continuity equation whose velocity field depends on the solution itself. This is in line with the usual mean-filed limit for finite dimensional interacting systems to which a large literature is devoted. On the other hand, a Vlasov-type PDE can also be obtained by a vanishing viscosity procedure (ε ↓ 0) starting from the nonlinear diffusion for the law of the McKean-Vlasov process.
It seems natural to wonder whether the above diagram possesses some form of commutativity: Is it possible to freely interchange the two limit operations N ↑ +∞ and ε ↓ 0? and to what extent? With this question in mind, in the present manuscript we go a step further in the analysis by considering large deviations asymptotics for the empirical measure and the associated stochastic current, trying to capture also fluctuations around the equilibria as N ↑ +∞ and ε ↓ 0.
Partial answer to this question are present in the literature. For what concerns the limit N ↑ +∞, the main reference for large deviations of stochastic mean field particle systems we refer to is [9], see also [21,26]. In [9] the authors deal with uniformly nondegenerate diffusions, interacting through the drift term, and they derive a LDP for the empirical measure via a careful discretization procedure. A subsequent result in this direction has been obtained in [5] where the authors adopted a weak convergence approach combined with a variational representation result for moments of nonegative functionals of Brownaina motion [4]. This strategy actually bypasses the above mentioned discretization procedure as well as exponential probability estimates, and could cover some models with interaction in the diffusion. Many other generalizations/directions have been explored in the literature, let us refer to e.g. [8] for multilevel LDPs, [10] for discrete-time systems [7] for what concerns random environment, [23] for Jump processes, [12] in the rough path setting and [22] for the application to the theory of control. Also the deterministic counterpart of the mean field theory (N ↑ +∞) is by now an active area of research collecting motivations ranging from physics to biology, from social sciences to control theory. In the last decade there has been a significant effort in providing rigorous derivation of the PDE models starting from finite dimensional systems. For a general result in this direction we refer to [6], where a well-posedness theory for some kinetic models is taken into account. See also [15] for further references and an application to optimal control problems. The convergence ε ↓ 0 at the level of the particles system fits into the framework of Friedlin-Wentcell theory [16]. Whereas for what concerns the nonlinear diffusion, a LDP for McKean-Vlasov equations in the small noise limit has been firstly established in [17] and then generalized in many directions, see e.g. the recent [27] and the references therein. Recall that, at a purely PDE level, the limit ε ↓ 0 coincides with a vanishing viscosity limit for the nonlinear diffusion towards the solution to the Vlasov PDE.
Finally, in [18] the authors addressed the problem of interchanging mean-field limit with the small noiseone. What they proved is that the rate functional associated with the first particle in a mean field system actually converges to the rate functional of the hydrodynamical equation as N becomes large.
In this paper we further study the combination of mean-field limit and small-noise limit by establishing a LDP for the empirical measure and stochastic current as ε ↓ 0, N ↑ +∞ simultaneously. A general motivation for studying LDPs for the pair measure-current comes from non-equilibrium statistical physics, in which the current is an important observable of the system. No less importantly, within this framework an explicit formula for the rate functional is often feasible and the corresponding LDP formulation for the empirical measure can be obtained by contraction.
More specifically, in the present setting we consider N particles (x 1 , . . . , x N ) in R d solving the system of SDEs: independent Brownian motions. We associate to the system the empirical measure we define the stochastic current as If we denote by P N,ε the law of the solution to (1.1) and by X the space we aim at showing that the probability measures P N,ε • (µ N , J N,ε ) −1 ∈ P(X) satisfy a LDP in X with speed ε/N and (good) rate functional given in a variational form by This is to say that for any Borel set B ⊂ X − inf (µ,J)∈B independently of the order of the limits in ε and N (see Theorem 3.7 for a precise statement with minimal assumptions). The proof of the above result is exhibit in a direct way, finding a microscopic perturbation of the system, macroscopically non trivial, from which it is possible to get the correct form of the large deviation functional. This is more evident writing the rate functional in an explicit fashion. In fact, denoting µ := T 0 δ t ⊗ µ t dt, when I(µ, J) < +∞ there exists a vector field h ∈ L 2 (Q,μ; R d ) for which dJ = hdμ and for all the pairs (µ, h) satisfying in a distributional sense Within this setting, the formulation of the LDP for the pair measure-current is fundamental to get the explicit form of the rate functional above. Also notice that in the limit ε ↓ 0, given a measure µ there exist more than one current J for which the continuity equation is satisfied and the application of a contraction principle is not trivial. In the Appendix (see Theorem 6.1) we finally give some sufficient condition on the velocity field v := F + h and on the initial datum µ 0 for which wellposedness of the Vlasov PDE (1.7) is guaranteed.
The proof of the large deviations upper bound is constructed by a specific tilting of the measures, providing the right estimates on compact sets. To get the required expression for closed sets a careful exponential tightness argument for both empirical measure and stochastic current is needed. The proof of the lower bound is more delicate. Firstly, in Theorem 5.1 we exploit the relation between large deviations and Γ-convergence developed in [25]: this theorem is fundamental as it translates the lower bound estimate in a Γ-lim sup inequality. Then, we take advantage of the result obtained in [15] to construct a suitable recovery sequence. More precisely, in [15] the authors studied the interplay between finite and infinite dimensional control problems for multi-agents systems and they obtained a Γ-convergence result (as the number of agents goes to infinity) under weak assumptions on the interaction kernel as well as on the cost functional. This is crucial because the rate function I(µ, h) corresponds to a particular choice of the cost treated in [15] and it turns out that the recovery sequence actually provides a good perturbation of the system of SDEs for which the associated entropy remains controlled, see Theorem 5.2 and Theorem 5.5.
The paper is organised as follows. Some preliminary material concerning measure theory and topological issues, basic large deviations definitions, property of stochastic currents and solutions to the continuity equations are collected in Section 2. Section 3 is devoted to the setting of the problem, hypotheses and main results. The large deviations upper bound is discussed in Section 4 along with exponential tightness estimated and goodness of the rate functional. In Section 5 the strategy to get the large deviations lower bound is presented, and the proofs of the main theorems are exhibited. Finally, some sufficient conditions to have wellposedness of the Vlasov PDE are presented in the Appendix.

Notation and preliminaries
The following notation will be used throughout the paper.
We fix (Ω, F, P) a probability space endowed with a filtration (F t ) t∈[0,T ] satisfying the usual conditions as well as a family {W i , i ∈ N} of independent d-dimensional Brownian motions. Given a topological space X, we write P(X) for the space of Borel probability measures on X. We endow P(X) with the topology of weak (equivalently, narrow) convergence, in duality with bounded continuous functions C b (X). In the special case X = R d , we will also use the notation P p (R d ) referring to probability measures with finite pth-order moment: The space of Borel and vector-valued Borel measures is denoted by M (R d ), M (R d ; R d ), respectively. Given X, Y two topological spaces, P ∈ P(X) and a map f : X → Y , we alternately use the notation P • f −1 or f ♯ P to denote the push forward, or the image law, of the probability measure P under the map f . We furthermore refer to the compact-open topology on C(X, Y ) as to the topology whose corresponding subbase is given by the sets W (K, U ) = {f ∈ C(X, Y ) : f (K) ⊂ U } as K, U range over all compact subsets of X and opens subsets of Y , respectively. We indicate by Q the cylinder Q := (0, T ) × R d and we write a · b for the scalar product in R d , a, b ∈ R d . C ∞ c (Q) stands for the set of smooth compactly supported functions in Q and the notation C 1,2 c (Q) will be used for the set of compactly supported functions in Q which are C 1 in time and C 2 in space. Given D a smooth domain in R d , the fractional Sobolev spaces W s,2 (D), s ∈ R, will be shorthand H s (D). If µ ∈ D ′ (X) is a distribution (respectively, µ ∈ P(X)), we denote by µ, φ = µ(φ) the duality pair with a smooth function φ ∈ C ∞ c (X) (respectively φ ∈ C b (X)). Lastly, when we write a b we mean that there exists a positive constant C for which a ≤ Cb.
Let us now recall a version of the Gronwall lemma which will be used in the sequel: if a ∈ L 1 (t 0 , t 1 ),

Some useful results in measure theory
A completely regular space E is a topological space such that for every closed set C ⊂ E and every point x ∈ E \ C there exists a continuous function f : E → [0, 1] for which f (x) = 1 and f (y) = 0, for every y ∈ C.
Roughly speaking, it is possible to separate x and C with a continuous function. A completely regular space which satisfies the Hausdorff condition is called Tychonoff space. Notice that, given X a normed space, the weak* topology of X ′ is Tychonoff. Moreover, if X is separable, bounded sets in X ′ are metrizable. We say that a map f : E → F between two topological spaces is Borel measurable if f −1 (A) is a Borel set, for any open set A. We denote by M(E) the collection of real-valued Borel measurable maps. If E is a topological space and F ⊂ M(E), we say that F separates points of E if for x = y ∈ E there exists h ∈ M such that h(x) = h(y). Again, we say that a set G ⊂ M(E) is separating for E if given P, Q ∈ P(E) with the property then it follows that P = Q. A classical result [13,Prop. 3.4.4] assures that if (E, d) is a complete, separable and locally compact metric space, then C c (E) is separating (actually convergence determining). Notice that, being C c (E) separating for E, if P, Q ∈ P(E) with P = Q then there exists a function h ∈ C c (E) such that . This means that the family separates points of P(E) (endowed with the topology of the narrow convergence).
Given p ≥ 1, we define the L p -Wasserstein distance where Π(µ 0 , µ 1 ) is the set of the optimal plans: The infimum in (2.5) is always attained (and finite) if µ 0 , µ 1 belong to the space P p (R d ) of Borel probability measure with finite p-moment. Let us notice that P p (R d ) endowed with the Wasserstein distance W p (µ 0 , µ 1 ) is a complete and separable metric space. A sequence (µ n ) n∈N ⊂ P p (R d ) converges to a limit µ ∈ P p (R d ), with respect to the Wasserstein distance W p , i.e. W p (µ n , µ) → 0, if Notice that the class C c (R d ) also separates points of P p (R d ) endowed with the p-Wasserstein distance. If X is a Polish space, Prokhorov theorem guarantees that a subset K ⊂ P(X) is tight if and only if it is (relatively) compact. Moreover this is equivalent to the existnece of a function ϕ : X → R + with compact sublevels such that sup Consider now the subset of discrete measures P N (R d ) ⊂ P p (R d ) given by Starting from a vector x : and we refer to the map µ N : (R d ) N → P N (R d ) as to the empirical measure. Notice that, given In the following we say that a map G : G(x, y) = G(x, σ(y)) for every permutation σ : Given a symmetric and continuous map G N we can associate a function defined on measures G : If X is a Polish space and P, Q ∈ P(X) are two probability measures, the relative entropy of Q with respect to P is defined as: Equivalently H(Q|P) := sup{Q(φ) − log(P(e φ )), φ ∈ C b (X)}, from which the convexity of the map H(·, P) easily follows. It is useful to recall the following basic inequality Moreover, if Y is a Polish space and f : X → Y a measurable function then

Large deviations principle
Large deviations estimates describe the limiting behaviour of a family of probability measure through the knowledge of a rate functional. We refer to [11] for a general treatise of the subject. Let us recall here the very definition of a Large Deviation Principle (in short LDP).
Definition 2.1 Let X be a Hausdorff topological space and P ε ∈ P(X) a family of probability measures on X. We say that P ε satisfies a good large deviation principle with speed β ε ↓ 0 and rate function J : X → [0, +∞] if the following conditions are satisfied: (i) (Goodness) For every a ≥ 0, the set {x ∈ X : J(x) ≤ a} is compact. (2.14) Establishing a LDP for the family P ε gives a precise formulation of (logarithmic) asymptotic bounds of the form P ε (B) ≍ exp(−β −1 inf B J). The validity of such a result implies that inf x∈X J(x) = 0, where the zero level set is not empty thanks to the goodness of the rate functional. Notice that, in the special case {x} = {x : J(x) = 0}, a LDP implies the law of large numbers P ε → δx.
A fairly classical strategy to get (2.13) consists in showing it first for compact sets and subsequently prove that the most of the probability mass is concentrated on compact sets. To this aim, the notion of exponential tightness comes into play: Definition 2.2 Let X be a Hausdorff topological space and a sequence β ε ↓ 0. We say that a sequence of probability measures P ε ∈ P(X) is exponentially tight with speed β ε if there exists a sequence of compacts Notice that exponential tightness of the family P ε is not a priori necessary for the formulation of a LDP with good rate functional. Nonetheless, if the family P ε is exponentially tight and satisfies a LDP lower bound for every open sets, then the rate functional is automatically good.
Let us now recall a characterization of exponential tightness in the space of continuous functions C([0, T ]; R), inspired by [3], which will be useful in the sequel. Given a function u ∈ C([0, T ]; R) we denote by ω(u; δ) its continuity modulus: ) of probability measures is exponentially tight with speed β ε iff the following two conditions hold:

Stochastic currents and continuity equation
The notion of (Stratonovich) stochastic currents were discussed in [14] with the attempt to investigate the links between deterministic currents and the theory of rough paths. One of the main interest of the paper is the pathwise regularity of stochastic integrals of the form: where η : Q → R d is a compactly supported smooth vector field and X t is a semimartingale. When η does not depend on time, the authors in [14] showed that the map η → J(η) defines with probability one a linear functional on H s+1 (R d ; R d ) for s > d/2 + 1. The extension of this result to the time dependent case is the content of the following theorem. Let us use the notation Theorem 2.4 Let X t = V t +M t be a semimartingale with values in R d and η : Q → R d be a smooth function with compact support. Then, given s 1 ∈ (1/2, 1) and Proof. see Appendix.
Let now (µ t ) t∈[0,T ] be a family of probability measures satisfying the continuity equation where the term J represents the current. Here we collect some properties of solutions to the above equation, emphasizing the link with the regularity of J. Let us start with a general definition.
We will show in Lemma 4.1 that the empirical measure and the stochastic current associated to the particles system (1.1) exactly fits into this framework, thanks to the pathwise regularity shown in Theorem 2.4.

Remark 2.6
To include initial ot final constraint, say µ 0 , µ T , in the definition of solution we can use test Let us now concentrate on a more specific situation. Specifically, suppose there exists a Borel vector field v : and J can be written in the form In this case the distributional formulation in Definition 2.5 reads and there is a by now classical connection between solutions to (2.24) and absolutely continuous curves More precisely, given a curve t → µ t ∈ AC 2 ([0, T ]; P 2 (R d )) it is convenient to define a space-time measurẽ Then we can find a (minimal) Borel vector field v ∈ L 2 (Q,μ; R d ) (i.e. Q |v(t, x)| 2 dμ(t, x) < +∞) such that J = vμ ≪μ is a vector measure and solves in a distributional sense which is equivalent to (2.24). On the other hand, when µ is a solution to (2.24) with a velocity field satisfying (2.22) then there exists a representative t → µ t ∈ P 2 (R d ), still denoted with µ t , belongings to AC 2 ([0, T ]; P 2 (R d )).

Statement of the problem and main result
Consider N particles (x 1 , . . . , x N ) whose dynamics is described by the following system of SDEs: independent Brownian motions. Given a time horizon T > 0, with a little abuse of notation we refer to the empirical measure as to and we will denote by µ N,ε t the image measure associated to a solution of (3.1) for every t ∈ [0, T ]: is a random measure for every t ∈ [0, T ]. We denote by P N,ε the law of the Ndimensional system P N,ε := Law(x N,ε ) = (x N,ε ) ♯ P and by P N,ε ∈ P(C([0, T ]; P(R d ))) the measure P N,ε := (µ N ) ♯ P N,ε induced by the empirical measure. The probability spaces we are dealing with are the following To complement the information contained in the random measures (µ N,ε t ) t∈[0,T ] we introduce for every From the general theory on stochastic currents (see also Theorem 2.4) the stochastic integral defined in (3.4) has a pathwise realization J N,ε , for every N ∈ N and ε > 0. Moreover J N ε ∈ H −s , for every . The objective of the paper is to investigate the behaviour of the system (3.1) as the number of particle tends to infinity and, simultaneously, in the small-noise regime. More precisely, denoting by X the following space we are interested in large deviations properties in the joint limit ε ↓ 0, N ↑ +∞ for the probability measures . We endow C([0, T ]; P 1 P (R d )) with the uniform 1-Wasserstein topology and Notice that X equipped with the above topology is not metrizable, hence not a Polish space. Nonetheless, to state a large deviation principle it is enough to have a Hausdorff topological space. In our setting X is actually a Tychonoff space with metrizable compacts (H s is indeed separable and reflexive). In the following, to emphasize the differences in obtaining the lower and the upper bounds in the LDP, we prefer to keep the hypotheses separated. The entire LDP holds a fortiori under the stronger assumptions of the lower bound. The first set of assumptions on the interaction field F and on the set of initial conditions are the following (3.5) thanks to the identification (2.9).
Hypothesis 3.2 coincides with the law of large numbers for the deterministic initial conditions, necessary for the convergence of the empirical measure associated to the system.
The second set of assumptions is as follows Hypothesis 3.4 (Initial data (lower bound)) The initial distribution The Lipschitz character of the interaction term in Hypothesis 3.3 provides uniqueness of solutions to (3.1) (see also Theorem 6.1 in the Appendix for what concerns the continuity equation) and it is crucial in the proofs of Propositions 5.3 and 5.4. On the other hand, the compact support of the initial conditions is not directly used but it is needed to profitably apply the convergence result in [15].

Remark 3.5 (An example of interaction): Given a continuous function H
and More specifically, if H = −∇W for an even function W ∈ C 1 (R d ), the system (3.1) can be viewed as a stochastic perturbation of the gradient flow of the interaction energy W : with respect to the norm A first result on the convergence of the finite dimensional stochastic system (3.1) toward a purely deterministic evolution of measures is contained in the following Theorem 3.6 If Hypotheses 3.3 and 3.4 hold, there exists a unique strong solution x N,ε to the system (3.1). The associated empirical measure and stochastic current (µ N,ε , J N,ε ) admits a limit (µ, J) as N ↑ +∞ and ε ↓ 0 in the following sense

10)
where the pair (µ, J) is the unique distributional solution to Here we are interested in a more precise asymptotic analysis of the behaviour of the pair (µ N,ε , J N,ε ). The main result of the paper describes the rate at which the probability of rare events occurs and it is formulated as a LDP for the probability measures P N,ε • (µ N , J N,ε ) −1 ∈ P(X) as N ↑ +∞ and ε ↓ 0. Recall that we use the notationμ for the space-time measureμ := T 0 δ t ⊗ µ t dt. Theorem 3.7 (LDP) Let I : X → [0, +∞] be the functional (given in a variational formulation) and define X l : (ii) Under Hypotheses 3.1 and 3.2, the sequence of probability measures P N,ε •(µ N , J N,ε ) −1 ∈ P(X) satisfies a LD upper bound on X with speed ε/N and rate functional I : X → [0, +∞]: (3.14) (iii) If Hypotheses 3.3 and 3.4 hold, the sequence of probability measures satisfies a LD lower bound on X with speed ε/N and rate functional I : The proofs of the above theorems are postponed at the end of Section 5. Notice that in the regime ε > 0, given a measure µ ε there exists a unique current J ε for which the McKean-Vlasov PDE is satisfied (thanks to the parabolic character of the equation). In the limit ε ↓ 0 this is no longer true and the application of the contraction principle is not trivial. Once more, this naturally requires the formulation of the above LDP for the pair measure-current.

Large deviations upper bound
This section is devoted to the analysis of the large deviation upper bound for the family of probability measures P N,ε • (µ N , J N,ε ) −1 ∈ P(X). Hereafter, given (µ, J) ∈ X we denote by I : X → [0, +∞] the functional where the constraint has to be intended in a distributional sense (see Definition 2.5). Let us firstly investigate the relation between the empirical measure (3.2) and the stochastic current (3.4). Exploiting the independence of the brownian motions in the dynamic (3.1) and applying Theorem 2.4 to the stochastic current J N,ε defined in (3.4) we get a pathwise realization J N,ε , for every N ∈ N and ε > 0 with J N ε ∈ H −s , s = (s 1 , s 2 ) ∈ ( 1 2 , 1) × ( d+2 2 , +∞). Furthermore, the the pair (µ N,ε , J N,ε ) satisfies a continuity equation as it is shown in the next Lemma. for every N ∈ N and ε > 0.
Let us now exhibit a direct proof of the LD upper bound for compact sets, via a specific exponential tilt of the measures P N,ε N,ε . The general case (for closed sets) will be obtained exploiting the exponential tightness of the measures.  Proof. Given B ⊆ X a Borel set, we want to estimate the quantity P N,ε ((µ N , J N,ε ) ∈ B). Fix η ∈ C ∞ c (Q; R d ) and define the martingale with quadratic variation Then, the corresponding stochastic exponential exp( Moreover, using the definition of the stochastic current given in (3.4), we can write where we denoted by R N,ε := T 0 µ N,ε s , div(η)(s, ·) ds the Itô correction term.
where for any N ∈ N, ε > 0, it is worth notinicing that I φ 2 (µ N,ε , J N,ε ) = 0 (see Lemma 4.1). Replacing η with N ε η we get thanks to relations (4.8) and (4.9). Taking the lim sup as ε ↓ 0 and N ↑ +∞, the term ε|R N,ε | vanishes and we can optimize in η, φ to get (4.11) The lower semicontinuity of the map (µ, J) → [I η 1 (µ, J) + I φ 2 (µ, J)] (seen as a map from from X to R) allow for the application of the minmax Lemma, see [20, App. 2, Lemmata 3.2 and 3.3], whence, for every compact set K ∈ X, since the sup in φ takes value 0 if the constraint is satisfied and +∞ otherwise. This implies that (4.5) holds with rate function which is the required result.

Exponential tightness
This section is devoted to the exponential tightness of the family P N,ε • (µ N , J N,ε ) −1 ∈ P(X) for which we investigate separately the two components P N,ε • µ N −1 ∈ P C([0, T ]; P(R d )) and P N,ε • J N,ε −1 ∈ P H −s . For what concerns the family P N,ε • µ N −1 a tightness criterium was first established by Jakubowski in [19]. Here we need a finer results, taking into account the exponential decay out of compact sets, which can be stated as follows. β ε log P ε (∃ t : x t / ∈ K l ) = −∞; (4.14) (ii) there is an additive family F ⊂ C(E; R) which separates points in E such that the associated sequence (f ♯ P ε ) ∈ P(C[0, T ]; R)) is exponentially tight with speed β ε , for every f ∈ F.  (c) Condition (ii) of the above theorem can be weakened: it is enough to choose elements f ∈ F such that the restriction f | K l is continuous.
In order to apply Theorem 4.3, we estimate the (quadratic) energy of the particles system. The crucial ingredient is a suitable generalization of Bernstein inequality for martingales, whose proof can be found in [24, Lem. 2]: Lemma 4.5 Let (M t ) t≥0 be a continuous martingale such that M (0) = 0 and E(M 2 t ) < ∞ for every t ≥ 0. If β ≥ 0 and C ∈ (0, +∞) then for any bounded stopping time τ it holds P sup The following proposition shows that the energy associated to the dynamic can be arbitrarily large but only with probability exponentially small.
Proof. Throughout the proof we shall denote by C > 0 a generic constant, whose value may change from line to line. From the Itô formula for φ(x) = |x| 2 applied to (3.1) we get where we used the trivial inequality and the classical theory for SDEs with Lipsschitz coefficients guarantees that M N,ε t is a P-martingale with E|M N,ε t | 2 < +∞, for every t > 0. Employing the Gronwall-type inequality (2.2) we end up with Due to the independence of {W i } i=1,...,N , the quadratic variation of M N,ε can be estimated as: (4.21) where in the last inequality we used estimate (4.20). Summing up and employing Lemma 4.5 we obtain , m > 0, (4.22) where the first equality follows by (4.21). Therefore Employing again (4.20) we get , m > 0. Proof. Throughout the proof we we maintain the notation of Proposition 4.6 for what concerns the constant First of all, notice that it is enough to prove the following equality A justification for this argument can be found in [3,Thm. 7.4]. This limit formulation is more convenient for the application of the Itô formula: let ϕ ∈ C ∞ c (R d ), then for each s ∈ [0, T − δ] and t ∈ [s, s + δ] it holds (4.28) The first term can be estimated exploiting the growth condition on F given in Hypothesis 3.1 (4.29) Whence for any ζ > 0 and from Proposition 4.6 it follows that Notice that in the last passage we used the elementary inequality: log P(U i ), (4.35) for any given probability measure and any measurable sets U 1 , . . . , U n , Summing up the estimates for A ϕ,s t and M ϕ,s t we get Taking the limit as ε → 0, N → ∞ we get lim sup Finally, when δ → 0 we get the required result.
Proof. From Theorem 2.4 we know that J N,ε admits a pathwise realization. Thanks to (6.30), (6.31) we know that J N,ε 2 where Z m n (k) is given as in (6.27): where we used the fact that 1 + |k| 2 Γ(da) < +∞. Thanks to Proposition 4.6 we know that  We start by estimating the bracket betweenZ(a) andZ(b): Concerning the bracket betweenȲ (a) andȲ (b) we have (4.48) Let us now observe that the processX t := Ȳ t (a)Γ(da) is itself a martingale with quadratic variation given by (4.49) Hence, from Lemma 4.5 and the computations above there exists C > 0 such that  For what concerns the empirical measure we show that conditions (i)-(ii) of Theorem 4.3 are satisfied. (i) For every l > 0 introduce the set K l ⊂ P 1 (R d ): (4.52) Thanks to Prokhorov theorem (see also (2.7)), K l is relatively compact in P 1 (R d ) endowed with the narrow topology, for every l > 0. The application of Proposition 4.6 readily implies that where we tacitly assume that f (µ)(t) =f (µ t ), whenever µ ∈ C([0, T ]; P 1 (R d )). Let us also notice that the same argument goes through just considering smooth functions f ∈ C ∞ c (R d ), whose linear envelope is uniformly dense in C c (R d ).

Goodness of the rate functional
Here we present a direct proof of the goodness of I : X → [0, +∞] under less restrictive assumptions than Hypothesis 3.3, for which a lower bound estimate holds (see Section 5). Recall indeed that when a family of probability measures is exponentially tight and satisfies a large deviation lower bound, then the associated rate functional is automatically good. and the pair (µ, J) satisfies in the distributional formulation.
Proof. Let A := {(µ, J) ∈ X : I(µ, J) ≤ a < +∞}. If (µ, J) ∈ A we know from Lemma 4.10 that there exists h ∈ L 2 (Q,μ; R d ) such that ∂ t µ t + div ((F (·, µ t ) + h(t, ·)) µ t ) = 0 and Fix l ∈ R + and take a sequence (µ n , J n = h n µ n ) ∈ A ∩ X l . From [15,Prop. 5.3] there exists a constantC, depending on C, T, a and R d |x|dµ n 0 (x) such that To get the equicontinuity property, let us follow the strategy of the proof of Theorem 6.1 in Appendix (part (i), step 3). In particular, for every n ∈ N we define (µ n,k ) k∈N as in (6.11) (using the characteristic equations associated to v n ) and from (4.63), (6.13) and (6.15) we get Now, for the empirical measures (µ n,k ) k∈N the equicontinuity follows from the same computation as in (6.14): where l ∈ R + does not depend neither on k nor on n, thanks to (4.64). Employing [1, Prop. 7.1.3] we have the finally get W 1 (µ n s , µ n t ) ≤ lim inf k→+∞ W 1 (µ n,k s , µ n,k t ) ≤ l|t − s|. For what concerns the sequence J n , let us denote v n (t, Using the same strategy as in the proof of [1,Thm 5.4.4] we can deduce the existence of a map v : for every φ ∈ C ∞ c (Q; R d ). Moreover, the continuity of the map F with respect to the Wasserstein distance implies that v(t, x) := F (x, µ t ) + h(t, x). This guarantees the weak* convergence (against test function φ ∈ C ∞ c (Q; R d )) of the R d -valued measures J n := v n µ n towards the limit J := vµ.
Let us now fix φ ∈ H s . By density there exists a sequence φ k ∈ C ∞ c (Q; R d ) such that φ k → φ in H s and To pass to the limit as k ↑ +∞ observe that H s ֒→ C 0 (Q), so that where in the last inequality we used the uniform bound (4.68). As a consequence we have J n (φ) → J(φ) for every φ ∈ H s , and we conclude.

Large deviations lower bound
This section is devoted to the proof of the large deviation lower bound. The proof is divided in two main parts: a first analytical step exploiting the deterministic recovery sequence obtained in [15] and a second probabilistic part in which a suitable tilt of the law associated to (3.1) is taken into account.
Notice that the construction of the recovery sequence in [15] requires the initial data to have uniformly compact support. This motivates the introduction of Hypothesis 3.4 hereinafter. For what concerns the velocity field F , we will assume Hypothesis 3.3, which guarantees the uniqueness of solutions to (3.1) and of the Vlasov equation as it is stated in Theorem 6.1 in Appendix.
To derive the large deviation lower bound we profit by the link with the Γ-convergence of the relative entropy functional (we refer to e.g. [25] for a detailed analysis on this topic). The following general theorem has been proved in [25,Thm. 3.4] in a Polish space setting, but the proof also applies to the setting of a completely regular topological space.
Theorem 5.1 Let P ε be a family of probability measures on a completely regular topological space X and {β ε } ε such that lim ε↓0 β ε = 0. Let also I : X → [0, +∞] be a lower semicontinuous functional. Then the following are equivalent (a) P ε satisfies a large deviations lower bound with speed β ε and rate functional I; (b) For any point x ∈ X there exists a sequence Q x ε ∈ P(X) weakly converging to the Dirac measure δ x such that lim sup The application of the above result relies on the construction of a suitable recovery sequence for which the entropy remains bounded. Such a sequence can be obtained starting from the deterministic guess provided in [15,Thm. 3.2] which we briefly report here: with J = hμ ≪μ, and let µ N 0 ∈ P N (R d ) be a sequence of initial measures with uniformly compact support such that W 1 (µ N 0 , µ 0 ) → 0 as N ↑ +∞. Then there exists a sequence (y N , (c) the equation   Proof. Take the difference between equations (5.7) and (5.3): From the Itô formula for φ(x) = |x| 2 we get )dW i (s) for the martingale part. Using the Gronwall inequality (2.2) we firstly get for some constant C ≥ 0. Employing now Lemma 4.5 to control the martingale term M N,ε h (t), if we proceed as in Proposition 4.6 (notice that the initial data cancel out) we easily get that which is the required estimate, due to the arbitrariness of δ > 0.
The corresponding result for the associated currents is contained in the following Proof. Let us start by writing the two currents: where we can assume η ∈ C ∞ c (Q; R d ) (henceforth we also employ the Lipschitz character of the map x → η(t, x). Hence For sake of clearness we study the three terms separately.
where we used the inequality | and subsequently the uniform control (5.4) given in Theorem 5.2. The second part can be estimated in a similar way from the uniform bound on 1 T 0 h(t, y N i (t)) 2 dt given by Theorem 5.2(e). Hence, for some constant From inequality (4.35) we easily get Collecting the estimates (5.22) and (5.24) we easily get the required result. Proof. Assume that I(µ, J) < +∞, otherwise the result is easily true. Lemma 4.10 guarantees that dJ = h dµ, for some h ∈ L 2 (Q,μ; R d ) and Introduce now the martingale and it is uniformly bounded as N ↑ +∞ thanks to Theorem 5.2-(e). In order to apply Theorem 5.1 we introduce the probability measures Then we can compute the rescaled entropy which is the required bound. To conclude the proof it is enough to show that P N,ε h • (µ N,ε,h , J N,ε,h ) −1 ⇀ * δ (µ,h) as N ↑ +∞, ε ↓ 0. With the notation δ (µ,h) we mean the probability measure concentrated on the solution (µ, h) to ∂ t µ t + div ((F (x, µ t ) + h(x, t))µ t ) = 0. But this is a consequence of the estimates given in Proposition 5.3 and Proposition 5.4. Indeed, take a function Ψ ∈ C b (X) and define the sets where we took advantage of the continuity and boundedness of Ψ. Thanks to Proposition 5.3 and Proposition 5.4 there existN ,ε such that P N,ε h (B N,ε δ ) ≤ l 2 , for every N >N , ε <ε, hence Employing the continuity of Ψ and the convergence of the sequence (σ N , ν N ) ⇀ (µ, hµ) in X we easily get which is the required convergence.

Proofs of the main results
Now we can conclude the proof of the main theorems collecting the results obtained above.
Proof of Theorem 3.6. Thanks to Hypothesis 3.3 the existence and uniqueness for the system (3.1) is fairly standard. Let us denote by σ N ∈ C([0, T ]; P(R d )) the empirical measure associated to the deterministic system d dt Employing Propositions 5.3 and 5.4 in the simpler case h N = 0 we get for every η ∈ C ∞ c (Q; R d ). On the other hand, from a compactness argument analogous to the one in the proof of Theorem 6.1 (part (i), steps 3-4) there exists (µ, ν) ∈ X such that where dν = F (·, µ)dµ and ∂ t µ t + div(F (x, µ t )µ t ) = 0 and the solution is unique, thanks to Theorem 6.1 in the Appendix. The combination of (5.33) and (5.34) finally guarantees the result.
Proof. In the proof assume for simplicity that the function c ∈ L 1 (0, T ) appearing in (6.1) is actually constant. For the more general case the technique is equivalent. where the constant C depends only on c, T, R d |x|dµ 0 (x) and the doubling constant K of θ. STEP 2: (Approximation of initial data) Starting from µ 0 ∈ P(R d ) we define a sequence of compact sets A k with the property We introduce the normalized measures µ k 0 ∈ P(R d ) Moreover, ψ is bounded and continuous on A k so that This implies that for every k ∈ N there existsm(k) for which where we used (6.6) and (6.7). This easily implies that W 1 (μ k 0 , µ 0 ) → 0, when k ↑ +∞, thanks to the superlinearity of ψ. Moreover sup k∈N R d ψ(x)dμ k 0 (x) < +∞. (6.9) STEP 3: (Compactness) Given k ∈ N, from the previous step we get a finite set of initial data x k 0,1 , . . . , x k 0,k ∈ R d , hence we can introduce the system of characteristics ẋ k i (t) = v(t, x k i (t), x(t)) t ∈ [0, T ] x k i (0) = x k 0,i .

(6.10)
Existence of a solution to (6.10) is guaranteed by the regularity of v. Indeed, defining V (t, x) := (v(t, x 1 , x); . . . ; v(t, x k , x)), the system can be written asẋ(t) = V (t, x(t)), where V is Charateodory thanks to the continuity of v w.r.t x and condition (6.2). Now we can associate to x(t) the empirical measure where the constantC depends only on C, T . Thanks to the apriori estimate given in step 1 and the uniform control on the initial data in (6.9) we actually get a stronger estimate Furthermore, if s ≤ t ∈ [0, T ] we compute for some positive constant l ∈ R + independent of k. From the application of Ascoli-Arzelà theorem we get the existence of a limit curve µ ∈ C([0, T ]; P 1 (R d )) such that lim k→+∞ sup t∈[0,T ] W 1 (µ k t , µ t ) = 0 (6.15) STEP 4: (Identification of the limit) To show that the candidate limit is a solution to equation (6.3) let us check that for every ε > 0 there existsk such that for every k ≥k Hence (6.1) assures that x, µ t )dµ t (x)dt < 2ε. (6.17) It remains to estimate ∇φ(x) · v(t, x, µ k t ) − v(t, x, µ t ) dµ k t (x)dt |v(t, x, µ k t ) − v(t, x, µ t )| + t 0 C (1 + |x|) dµ k t (x) − dµ t (x) dt. (6.18) and use Fourier inversion formula along with stochastic Fubini theorem. Precisely, define the functions e m n,k : Q → C d : e m n,k (t, x) := 2 − δ n,0 T cos nπt T e ik·x (2π) d/2 e m , (6.24) where e 1 . . . , e d is the canonical basis in R d and n ∈ Z + , k ∈ R d . Given η as above we denote byη m n (k) its Fourier coefficients:η so that B is a linear and bounded operator, whence extendible by density to L 2 (Ω; H s ). By Riesz representation theorem there exists Ψ ∈ L 2 (Ω; H −s ) such that E ( Ψ, η 1 Ω ′ ) = B(η1 Ω ′ ), ∀ Ω ′ ⊂ Ω, η ∈ H s . (6.35) From the arbitrariness of Ω ′ we get that Ψ, η = J(η) P-a.s., and choosing any representative J : Ω → H s of Ψ we get the required result.