Central limit theorem over non-linear functionals of empirical measures with applications to the mean-field fluctuation of interacting particle systems

In this work, a generalised version of the central limit theorem is proposed for nonlinear functionals of the empirical measure of i.i.d. random variables, provided that the functional satisfies some regularity assumptions for the associated linear functional derivative. This generalisation can be applied to Monte-Carlo methods, even when there is a nonlinear dependence on the measure component. We use this result to deal with the contribution of the initialisation in the convergence of the fluctuations between the empirical measure of interacting diffusion and their mean-field limiting measure (as the number of particles goes to infinity), when the dependence on measure is nonlinear. A complementary contribution related to the time evolution is treated using the master equation, a parabolic PDE involving L-derivatives with respect to the measure component, which is a stronger notion of derivative that is nonetheless related to the linear functional derivative.


Introduction and notations
Central limit theorems (CLTs) and their generalisations have long been studied in the last century. The first notable generalisation of the CLTs was proposed by Lyapunov in 1901, which only requires the random variables to be independent, but not necessarily identically distributed, under certain growth conditions of moments of some order 2 + δ. The moment condition can be further weakened in the Lindeberg condition (proposed in 1922) and is used in most cases where weak convergence to a normal distribution is considered with non-identically distributed variables. See [23] for more details regarding the history of different versions of CLTs. Since then, the literature on different types of CLTs is enormous and there are corresponding versions for dependent processes, martingales and time series. In the mathematical statistics literature, particular attention has been paid to CLTs that are uniform over a class of test functions (see for instance Sections 2.5 and 2.8 in [33]), in order to extend the one-dimensional case of the indicator functions of the intervals ((−∞, x]) x∈R which is covered by the Kolmogorov-Smirnov theorem. Von Mises [24,32] was the first to address the case of nonlinear functionals of the empirical measure 1 N N i=1 δ ζ i of independent and identically distributed R d -valued random vectors (ζ i ) i≥1 through the use of Taylor expansions and we refer to Chapter 6 in [25] for a book presentation of the theory that he initiated. He explored the possibility that the first order term in the expansion provides a vanishing limit and then the lowest order term with nonzero limit converges to some non Gaussian distribution. While the limiting behaviour of the various terms in the expansion with derivatives computed at the common distribution m 0 of the random vectors may follow from standard limit theorems from probability theory (in particular, the usual central limit theorem applies to the first order contribution), the main challenge is to prove that the remainder which mixes the empirical This research benefited from the support of the "Chaire Risques Financiers", Fondation du Risque. 1 measure with m 0 in the derivatives goes to zero. Let us now discuss this issue in the case treated in the present paper of first order expansions where the difficulty is not less. In dimension d = 1, Boos and Serfling [2] assume the existence of a Gateaux differential d dε | ε=0 + U m 0 + ε(ν − m 0 ) = dU (m 0 , ν − m 0 ) (for ν any probability measure on the real line) linear in ν − m 0 of U at m 0 such that From the boundedness in probability of , as a consequence of [13], they deduce the weak convergence of ) to a centered Gaussian random variable with asymptotic variance equal to the common variance of the independent and identically distributed random variables dU (m 0 , δ ζ i −m 0 ) when they are square integrable and centered. For more flexibility, they remark that the conclusion remains valid when the third term in the left-hand side is multiplied by a random variable which converges in probability to 1 as N → ∞. In addition to the limitation of their approach to dimension d = 1, it relies on the uniformity of the approximation with respect to the Kolmogorov-Smirnov distance, which is a strong assumption almost amounting to Fréchet differentiability of U at m 0 for the Kolmogorov norm ν −m 0 = ν((−∞, ·]) − m 0 ((−∞, ·]) ∞ : where the class F of measurable functions is such that a central limit theorem for empirical measures holds with respect to uniform convergence over F. Clearly the requirements on F impose some balance: the larger the value of F, the easier Fréchet differentiability becomes, but the stronger the uniform convergence over F becomes. The following is mentioned by Dudley [12] in p.76: "the Gateaux derivative has been considered too weak (see also p.110 in [14], p.216 in Serfling [25] and p.40 in Huber [19]), unless there is some uniformity along different lines and such uniformity is all the more needed in this paper".
The linear functional derivative of U (see [6], [7], [9] and [11]) that we recall and further investigate in the second section of the present paper and subsequently apply in the third section to study the asymptotic behaviour of is also a Gateaux derivative, but with the additional weak requirement that dU (m 0 , ν − m 0 ) = R d δU δm (m 0 , y)(ν − m 0 )(dy), for some measurable real valued function R d ∋ y → δU δm (m 0 , y) with some polynomial growth assumption in y. Therefore, the linearity, square integrability and centered property mentioned above (when summarizing [2] and what we will also need) are automatically satisfied when the growth assumption is related to the index of the Wasserstein space that contains all the probability measures under consideration. To avoid the uniformity leading to Fréchet differentiability required in the statistical literature, we suppose that the linear functional derivative exists not only at m 0 but on a Wasserstein ball with positive radius containing m 0 . This is a quite mild restriction, since when a central limit theorem holds for some statistical functional, it is in general not limited to a single value of the common distribution m 0 of the samples. Then we linearise and a remainder. This decomposition is different from the one only involving m 0 as the measure argument in the Gateaux derivative considered in the previously discussed literature on Von Mises differentiable statistical functions or in the recent papers [11] and [29] also using the linear functional derivative. It is aimed at enabling the analysis of the limiting behaviour of the sum by the central limit theorem for arrays of martingale increments while permitting to exploit that the very strong total variation distance between m N,i is smaller than s N , in order to ensure that the remainder vanishes in probability as N → ∞, as soon as δU δm (ν, y) satisfies some Hölder continuity with exponent α > 1 2 in total variation with respect to its first variable. In our CLT for nonlinear functionals U , we add some further regularity assumptions on δU δm to check the Lindeberg condition and the convergence of the bracket of (1.1).
The second main result of this work is a CLT on mean-field fluctuations. Large systems of interacting individuals/agents occur in many different areas of science; the individuals/agents may be people, computers, flocks of animals, or particles in moving fluid. Mean-field theory was developed to study particle systems by considering the asymptotic behaviour of the agents or particles, as their number goes to infinity. Instead of considering a system with a huge dimension, one can effectively approximate macroscopic and statistical features of the system as well as the average behaviour of particles. In a probabilistic setting, the limiting behaviour can be described by a type of SDEs, called McKean-Vlasov SDEs, whose coefficients depend on the probability distribution of the process itself. We consider the fluctuation between a standard particle system (Y i,N ) 1≤i≤N (see (4.7) for its model) and its standard McKean-Vlasov limiting process X (see (4.8) for its equation). When the interaction only takes place in the drift coefficient and the diffusion coefficient is bounded from below (which, in particular, holds when the diffusion coefficient is constant), it is possible to express the density of the law of the interacting particle system with respect to that of independent copies of the McKean-Vlasov limiting process by Girsanov theorem. Then a central limit theorem may be derived by studying the limiting behaviour of this density using symmetric statistics and multiple Wiener integrals as in [27] and [26].
When interaction also takes place in the diffusion coefficient, this is no longer possible and the standard approach in the literature involves an approximation of the average position of a smooth test function φ : R d → R of the particles by (4.8) and its limiting fluctuation. More precisely, denoting µ N to be the empirical measure of all the particles and µ ∞ to be the law of X, one considers the decomposition where the fluctuation measure S N is defined by and m, φ := R d φ dm, for any signed measure m. The classical approach is to show that the sequence of random measures (S N ) N ≥1 converges in law as random processes taking values in some Sobolev space. This is done via a classical tightness argument, which implies the existence of a weak limit (through a subsequence) by the Prokhorov's theorem. The limit is shown to satisfy an Ornstein-Uhlenbeck process in an appropriate space. In [17], the Sobolev space being considered is p is the dual of Φ p , with Φ p being the completion of the Schwarz space of rapidly decreasing infinitely differentiable functions under a suitable class of seminorms · p . This result was generalised in [22] A similar result was proven in [11] to include meanfield equations with additive common noise. We remark that, in all these approaches, by considering measures to be in the dual of a Sobolev space, a linear dependence on the measure component is imposed implicitly. Unlike the approach in [11], [17] and [22], we analyse the fluctuation under nonlinear functionals Φ : P 2 (R d ) → R, i.e. we consider the limiting distribution of the process in the space C(R + , R), where P 2 (R d ) denotes the space of probability measures with finite second moments. This gives us a limiting CLT in mean-field fluctuations in the space C(R + , R). The development of the theory in this paper relies on the calculus on the Wasserstein space. We use two notions of derivatives in measure in this paper. The first notion, the linear functional derivative, is an analogue of the variational derivative over a manifold (see [6]). Linear functional derivatives are used to prove the different versions of CLTs for i.i.d. random variables. The second notion, the Lderivative (see the notes by Cardaliaguet [5]), was introduced by Lions in his lectures at the Collège de France by defining a derivative in the W 2 space based on the 'lift' to the L 2 space of square-integrable random variables (see (4.1)). According to [15], the L-derivative coincides with the geometric derivative introduced formerly in [1]. L-derivatives are used to prove the CLT for mean-field fluctuations.
The paper is organized as follows. Section 2 focuses on the notion of linear functional derivatives as well as their properties. Section 3 exhibits three versions of CLTs (with different sufficient conditions) through the properties of linear functional derivatives developed in Section 2. Finally, Section 4 develops the notion of L-derivatives followed by a version of CLT on mean-field fluctuations.
1.1. Notations. R + denotes the set of non-negative real numbers. For real numbers a and b, a ∧ b and a ∨ b denote respectively the minimum and maximum of a and b. For c, d ∈ R d , c · d denotes the dot product between c and d. We denote the Hilbert-Schmidt norm of any matrix by · . For any a, b ∈ R that depend on N , the notation a b denotes a ≤ Cb, for some constant C > 0 that does not depend on N .
For any function g : R → R, we adopt the notations g ′ + (s) or d dǫ ǫ=s + g(ǫ) to denote the right-hand derivative of g at s ∈ R. In the final section, we consider the space C(R + , R), which is the space of continuous functions from R + to R equipped with the metric For ℓ ≥ 0, we denote by P ℓ (R d ) the set of probability measures m on R d such that R d |x| ℓ m(dx) < ∞. For ℓ > 0, we consider the ℓ-Wasserstein metric, defined by For ℓ ≥ 1, it is well known that W ℓ is a metric on P ℓ (R d ) and that if µ ∈ P ℓ (R d ) and (µ n ) n∈N is a sequence in this space, then lim n→∞ W ℓ (µ n , µ) = 0 iff µ n converges weakly to µ as n → ∞ and lim n→∞ R d |x| ℓ µ n (dx) = R d |x| ℓ µ(dx) (see for instance Definition 6.4 and Theorem 6.9 in [34]). For ℓ ∈ (0, 1), the definition of W ℓ is not so standard and we check in Lemma 5.1 in Appendix that these properties remain true. We also consider the total variation metric on the set P 0 (R d ) of all probability measures on R d given by Notice that W 0 (µ 1 , where B(R d ) denotes the Borel σ-algebra of R d and |µ 1 − µ 2 | the absolute value (or total variation) of the signed measure µ 1 − µ 2 . We have inf ℓ≥0 W ℓ ≥ W where W , defined like W 1 but with |x − y| ∧ 1 replacing the integrand |x − y| in (1.2), metricizes the topology of weak convergence according to Corollary 6.13 [34].
For any random variable ξ, L(ξ) denotes the law of ξ. Finally, L 2 (Ω, F, P; R d ) denotes the Hilbert space of L 2 random variables taking values in R d , equipped with the inner product ξ, η = E[ξ · η].
Acknowledgements. We thank the referee for mentionning the literature on Von Mises differentiable statistical functions to us and Laëtitia Della Maestra for numerous relevant remarks on the first version of the manuscript.

Linear functional derivatives and their properties
The notion of linear functional derivatives appears in quite a few papers in the literature. It is defined as a functional derivative in [6], through a limit of perturbation by linear interpolation of measures (see (2.1)). It can also be defined via an explicit formula concerning the difference between the values of the function evaluated at two probability measures (see (2.4)), as more often done in the literature of mean-field games and McKean-Vlasov equations, such as [7], [9], [11] and [31]. Corollary 2.4 shows that (2.1) implies (2.4) under some growth assumption. Conversely, if we assume that the linear functional derivative is continuous in the product topology of P ℓ (R d ) × R d , then one can easily check that (2.4) implies (2.1).
Inductively, for j ≥ 2, supposing that U admits a (j − 1)-th order linear functional derivative we say that U admits a j-th order linear functional derivative derivative at µ if for each y ∈ (R d ) j−1 , m → δ j−1 U δm j−1 (m, y) admits a linear functional derivative at µ i.e. there exists a real valued measurable function R d ∋ y → δ j U δm j (µ, y, y) such that sup y∈R d δ j U δm j (µ, y, y) /(1 + |y| ℓ ) < ∞ and δm is defined up to an additive constant via (2.1). Iteratively, we normalise the higher order derivatives via the convention that δ j U δm j (m, y 1 , . . . , y j ) = 0, if y i = 0 for some i ∈ {1, . . . , j}. (2. 3) The following class S j,k (P ℓ (R d )) is used as hypotheses of the central limit theorems in the subsequent section.
Definition 2.2 (Class S j,k (P ℓ (R d ))). For j ∈ N and k, ℓ ≥ 0, the class S j,k (P ℓ (R d )) is defined by The next theorem expresses a finite difference of the (j − 1)-th order functional derivative as an integral of the j-th order functional derivative.
is Lipschitz continuous and One easily deduces the following corollary.
Proof of Theorem 2.3. For simplicity of notations, the proof is presented for j = 1. The argument for other values of j is identical. For s ∈ (0, 1) and 0 < h < s∧(1−s), by the definition of linear derivatives, .
, admits the right-hand derivative g(0) at 0 and the left-hand derivative g(1) at 1. This function is therefore continuous on [0, 1]. Since m, m ′ ∈ P ℓ (R d ) and sup (s,y)∈[0,1]×R d δU δm (m s , y) /(1 + |y| ℓ ) < ∞, the function g is bounded on [0, 1]. Therefore [0, 1] ∋ s → U (m s ) is Lipschitz continuous. We last apply the (only) theorem in [35] to deduce that (2.5) We now state a chain rule concerning the computation of linear functional derivatives. It is an easy consequence of the classical chain rule and the fact the normalisation convention (2.3) clearly holds.
be a function such that each of its coordinates admits a linear functional derivative at µ ∈ P ℓ (R d ). We denote by δϕ δm (µ, y) the vector in R q with coordinates given by these linear functional derivatives. Let F : R q → R be a function differentiable at ϕ(µ). Then the function U : The following example is an easy but important consequence of the chain rule and will be used in subsequent parts of the paper.
Then, by Theorem 2.5, for i ∈ {1, . . . , j}, the ith order linear functional derivative is given by Suppose that there exist constants C > 0 and k i ≥ 0, i ∈ {1, . . . , j}, such that Then it can be checked by Young's inequality that Example 2.7 (U-statistics (see [18] or [21]) and polynomials on the Wasserstein space). Let k ≥ 0, n ∈ N, ϕ : (R d ) n → R be measurable and such that For ℓ ≥ k, we consider the function on P ℓ (R d ) defined by Since replacing ϕ by its symmetrisation does not change the above integral, we suppose without loss of generality that (x 1 , . . . , x n ) → ϕ(x 1 , . . . , x n ) is symmetric i.e. invariant by permutation of the coordinates x i . For µ, ν ∈ P ℓ (R d ) and ε ∈ (0, 1], we have, denoting by |N | the cardinality of a subset N of {1, . . . , n}, where we used the symmetry of ϕ for the last equality. Therefore For j ∈ {1, . . . , n}, let where y J denotes the vector in (R d ) j with all coordinates with indices in J equal to those of (y 1 , . . . , y j ) and all coordinates with indices in {1, . . . , j} \ J equal to 0. Notice that, for i ∈ {1, . . . , j}, and when y i = 0 then for each when j ≤ n and 0 when j > n.
Let us suppose conversely that for some ℓ ≥ 0 and n ≥ 0, U ∈ S n+1,ℓ (P ℓ (R d )) with vanishing δ n+1 U δm n+1 . Then by Lemma 2.2 in [9], for µ, m ∈ P ℓ (R d ), The assumption and the normalisation condition then give, for the choice m = δ 0 , The following theorem generalizes Example 2.7 by enabling a differentiable dependence of the integrand on the measure.
The following theorem is similar to Theorem 2.8, but the measure in the integral is not necessarily the same as the measure in the argument of the function U .
admits a linear functional derivative in N µ and there exists a nonnegative Borel-measurable function C : Then U admits a linear functional derivative at µ given by Proof. We have Since, by Theorem 2.3, for ǫ > 0, (2.9) permits to apply the dominated convergence theorem and obtain Let us finally consider, in dimension d = 1, the example of the quantile function of m.
Example 2.10 (quantile function). Let for w ∈ (0, 1) and m ∈ P 0 (R), Let v ∈ (0, 1), m 0 ∈ P 0 (R) be such that the restriction of m 0 to a neighbourhood of U (v, m 0 ) admits a positive and continuous density p 0 with respect to the Lebesgue measure. Let us (2.10) On the neighbourhood of x 0 = U (v, m 0 ) on which m 0 admits a positive and continuous density, The image of the neighbourhood by this function is a neighbourhood of v, on which its inverse w → U (w, m 0 ) is also continuously differentiable with derivative 1 p 0 (U (w,m 0 )) . By (2.10) and the definition of where, by convention, the first factor is equal to ∂U ) and the right-hand side of (2.11) converges to as ε → 0 + . We conclude by remarking that, where, by the same arguments, the right-hand side also converges to (

Central limit theorem over nonlinear functionals of empirical measures
) converges in law to some centered Gaussian random variable to generalise the result of the classical CLT which addresses linear functionals U (µ) = ϕ(x) µ(dx) with ϕ : R d → R measurable and such that sup x∈R d |ϕ(x)|/(1 + |x| ℓ/2 ) < ∞. Note that, by this growth assumption and Example 2.7, this linear One could consider the same remainder as in the literature on Von Mises differentiable statistical functions and check using a linearisation in measure by Theorem 2.3 that, under extra regularity assumptions on U , √ N R N goes to 0 in probability as N → ∞. For instance, when U ∈ S 4,4 (P 2 (R d )) and m 0 ∈ P 8 (R d ), Theorem 2.5 in [29] which is inspired by Lemma 5.10 in [11] ensures that E[R 2 N ] 1 N 2 . In Theorem 3.1 below, we will rather find weaker regularity assumptions under which converges in distribution to N 0, Var δU δm (m 0 , ζ 1 ) by the central limit theorem for martingale increments and the difference between this term and √ N (U (m N ) − U (m 0 )) goes to 0 in probability as N → ∞.
Since the asymptotic variance is expressed in terms of δU δm , one can easily compute its value via Theorems 2.5, 2.8 and 2.9. For functionals U which do not satisfy the regularity assumptions in Theorem 3.1, the asymptotic variance in the central limit theorem can still be given by Var δU δm (m 0 , ζ 1 ) . Indeed, for the example of the quantile function in dimension d = 1, it is shown that under the assumptions of Theorem 2.10, Suppose that there exists r > 0 such that • U admits a linear functional derivative on the ball B(m 0 , r) centered at m 0 with radius r for the metric D, 2) converges to 0 when D(µ, m 0 ) goes to 0.

(3.4)
Then the following convergence in distribution holds : Remark 3.2. Using a W 0 -optimal coupling between µ 1 and µ 2 , one easily checks (see for instance Theorem 6.15 [34] when ℓ ≥ 1) that Hence any ball with positive radius for the metric 1 {ℓ>0} W ℓ (µ 2 , µ 1 ) + 1 {ℓ=0} W (µ 2 , µ 1 ) contains a ball with positive radius for the metric R d (1 + |y| ℓ )|µ 2 − µ 1 |(dy). Moreover (3.4) is weaker for the latter choice of D(µ 1 , µ 2 ) than for the former so that the assumptions of the theorem are satisfied for the latter when they are satisfied for the former. Unfortunately, when m 0 is not discrete, then does not go to 0 as N → ∞. This explains why we restrict the choice Notice that since m 0 ∈ P ℓ (R d ), the random measure m N,i s also belongs to P ℓ (R d ).
Dealing in the same way with W , we deduce that Since a.s. m N converges weakly to m 0 and R d |x| ℓ m N (dx) goes to R d |x| ℓ m 0 (dx) as N → ∞, for ℓ > 0, the sequence W ℓ (m N , m 0 ) converges a.s. to 0 as N → ∞ and is therefore a.s. bounded. Moreover, . For α ∈ (0, 1), by considering the two cases i/N ≤ α and i/N > α, we deduce that Choosing small values of α followed by large values of N , we conclude that max 1≤i≤N W ℓ (m N,i 1 , m 0 ) goes to 0 a.s. as N → ∞.
By Corollary 6.13 [34], W metricises the topology of weak convergence on P 0 (R d ) and therefore By adapting to W , as well as the above reasoning for W ℓ , we deduce that max 1≤i≤N W (m N,i 1 , m 0 ) goes to 0 a.s. as N → ∞. Let us now assume that m 0 is discrete, i.e. there is a sequence (y k ) 1≤k≤K with K ∈ N * ∪ {+∞} of distinct elements of R d such that K k=1 m 0 ({y j }) = 1. Then, by the strong law of large numbers, a.s. for each 1 The third term of the right-hand side is arbitrarily small fork large enough, whereas for fixedk, by the strong law of large numbers, the sum of the two first terms converges a.s. to 0 as N → ∞. Hence, converges a.s. to 0 as N → ∞. Second step : introduction of the linear functional derivative Under the convention min ∅ := N + 1, we deduce that for the radius r > 0 of the ball introduced in the hypotheses of the theorem, is almost surely N + 1 for each N ≥ N * , for some random variable N * taking integer values. For N ≥ N * , we have, using (3.2) and Theorem 2.3 for the second equality, Setting

3) and Young's inequality, we obtain that for
Using that m 0 ∈ P ℓ (R d ), we easily deduce that E|R N | N −α and since α > 1/2, lim N →∞ E √ N|R N | = 0. Fourth step : application of the Central Limit Theorem for martingales Let us introduce the filtration (F i := σ(ζ 1 , . . . , ζ i )) i≥1 for which I N defined in (3.8) is a stopping time. By (3.2), for 1 ≤ i ≤ N , the random variable The convergence of sup x∈R d δU δm (µ, x) − δU δm (m 0 , x) /(1 + |x| ℓ/2 ) to 0 when D(µ, m 0 ) goes to 0 together with the a.s. convergence of max 1≤i≤N D(m N,i 0 , m 0 ) to 0 imply the existence of a sequence of random variables (ε N ) N ≥0 converging a.s. to 0 as N → ∞ such that converges a.s. to 0. By continuity of the square function, so do On the other hand, using (3.11) and (3.2), we obtain that converges a.s. to 0 and 1 Var δU δm (m 0 , ζ 1 ) as N → ∞. By Corollary 3.1 p58 [16], to conclude that it is enough to check the Lindeberg condition : for each ε > 0, N i=1 E X 2 N,i 1 {X 2 N,i >ε} |F i−1 goes to 0 in probability as N → ∞. When ℓ = 0, δU δm is bounded on B(m 0 , r) × R d and this condition is clearly satisfied. Let us suppose that ℓ > 0 and check that it is also satisfied.
By (3.2), As, for a, b, c, d ∈ R + , where the right-hand side goes a.s. to 0 as N → ∞, since by the strong law of large numbers, 1 N N j=1 |ζ j | ℓ converges a.s. to R d |y| ℓ m 0 (dy) < ∞.
In the two next corollaries, we give sufficient conditions in terms of second order linear functional derivatives for the assumptions (3.3) and (3.4) to hold. The assumption (3.3) is directly implied by the existence of a second order linear functional derivative with appropriate growth. In the case m 0 discrete treated in Corollary 3.4 below, so does assumption (3.4), which is of similar nature since, then, µ 1 ), we also suppose regularity of δ 2 U δm 2 (µ, x, y) with respect to y to get (3.4).
For the second statement, we may choose r = +∞ in the proof of Theorem 3.1, so that I N = N + 1 in (3.9) and N * = 0 in (3.10) define the two terms in the decomposition U (m N ) − U (m 0 ) = Q N + R N . From the third step of that proof, we have E|R N | N −α with α > 1/2, while the martingale property and the estimation (3.12) Notice that the assumptions on U are satisfied as soon as U ∈ S 2,ℓ/2 (P ℓ (R d )).
The following example illustrates the power of Corollary 3.3, if a function behaves badly w.r.t. the measure component, but is very regular w.r.t. the spatial components. In this case, the conditions in Corollary 3.3 are easier to verify than Theorem 3.1.
where sign : R → R is the function defined by and Clearly, U ∈ S 1,0 (P 0 (R)) ∩ S 2,0 (P 0 (R)). Moreover, is Lipschitz continuous, uniformly in µ and x 1 . Therefore, the CLT holds for U by Corollary 3.3 applied with ℓ = 0. Of course, it can also be deduced from the classical delta method.
By Young's inequality, U ∈ S 3,6 (P 12 (R)). Therefore, U ∈ S 1,6 (P 12 (R)) ∩ S 2,12 (P 12 (R)). However, the condition on Hölder continuity in Corollary 3.3 does not hold, since y → y 2 is not uniformly continuous on R, therefore it cannot be Hölder continuous. We now show that the conditions of Theorem 3.1 hold for ℓ = 12 by showing that (3.3) and (3.4) are satisfied. Pick r > 0 and consider the ball B(m 0 , r) in the W 12 metric for m 0 ∈ P 12 (R). Since W 2 ≤ W 12 , there exists a constant C > 0 (depending on r and m 0 ) such that This implies that, for every µ 1 , µ 2 ∈ B(m 0 , r), which proves (3.3) for α = 1. Finally, we recall from Theorem 5.5 of Consequently, as W 12 (µ, m 0 ) and therefore W 2 (µ, m 0 ) go to 0, Suppose that the probability space (Ω, F, P) is atomless (i.e. there does not exist a measurable set which has positive measure and contains no set of smaller positive measure). Then for any µ ∈ P 0 (R d ), we can always construct an R d -valued random variable on Ω with law µ (see page 376 from [7]).
Recall that U is said to the Fréchet differentiable at θ 0 if there exists a linear continuous map D U (θ 0 ) : as η L 2 → 0. By the Riesz representation theorem, there exists a (P-a.s.) unique random variable The following theorem follows from Theorem 6.2 and Theorem 6.5 from [5] (or equivalently, Proposition 5.24 and Proposition 5.25 from [7]) combined with Corollary 3.22 [15].
For the time-dependent case, we extend the previous definition as follows. Example 4.7. The following functions F : R d × P 2 (R d ) → R belong to M k (R d × P 2 (R d )). 2 We do not consider 'zeroth' order derivatives in our definition, i.e. at least one of n, β1, . . . , βn and ℓ must be non-zero, for every multi-index n, ℓ, (β1, . . . , βn) .
(i) pth-degree interaction: where ϕ : (R d ) p+1 → R is bounded and C k with bounded and Lipschitz partial derivatives up to and including order k.
(ii) pth-degree polynomial on the Wasserstein space: where, for each i ∈ {1, . . . , p}, ϕ i : (R d ) 2 → R is bounded and C k with bounded and Lipschitz partial derivatives up to and including order k.
The following results establish links between linear functional derivatives and L-derivatives.
The next Lemma gives sufficient conditions in terms of L-derivatives for the hypotheses of Corollary 3.3 to be satisfied. Lemma 4.10. Let U ∈ M 2 (P 2 (R d )). Then the second order linear functional derivative of U satisfies Moreover, U satisfies the hypotheses of Corollary 3.3 for each ℓ ≥ 4.

4.2.
Mean-field fluctuation. We define Lipschitz-continuous (w.r.t. the product topology of as the drift and diffusion coefficients respectively. Let (Ω, F, P) be an atomless, complete probability space, on which we consider an random variables with law ν ∈ P 2 (R d ) that are also independent of W 1 , . . . , W N . This type of equations provides a probabilistic representation to many high-dimensional PDEs arising from kinetic theory and mean-field games. A standard approximation of this particle system is through the mean-field limit of µ N t (by the theory of propagation of chaos), which leads to the consideration of a corresponding McKean-Vlasov SDE given by where W is a d ′ -dimensional Brownian motion and ξ ∼ ν is independent of W . Analyses of the approximation of (4.7) by the mean-field limiting equation (4.8) are widely considered in the literature, such as [3], [22] and [28]. In particular, by [28], the condition of Lipschitz continuity of b and σ ensures existence and uniqueness of the solutions to (4.7) and (4.8) respectively. We consider the nonlinear fluctuation between the standard particle system (4.7) and its standard McKean-Vlasov limiting equation (4.8) under non-linear functionals Φ ∈ M k (P 2 (R d )), i.e. we consider the limiting distribution of the process The main analysis depends on the following function: where, for θ an R d -valued random vector independent of W , It is proven in Theorem 7.2 of [4] that, if ν ∈ P 2 (R d ), Φ ∈ M 2 (P 2 (R d )) and b i , σ i,j ∈ M 2 (R d ×P 2 (R d )), for i ∈ {1, . . . , d} and j ∈ {1, . . . , d ′ }, then V satisfies the master equation given by By the initial condition of (4.9), along with the definition of V, we have the decomposition To treat the first term, we define a finite dimensional projection V : Then We can now apply Itô's formula to this equality. Proposition 3.1 of [8] allows us to conclude that V is differentiable in the time component and twice-differentiable in the space components. Moreover, Proposition 3.1 of [8] expresses the first and second order partial derivatives of V in terms of the L-derivatives of V. This allows us to use (4.10) to obtain a cancellation in the L-derivatives (except the second order term). We now present the details of the above discussion (found in the proof of Theorem B.2 in [29]) as follows. Setting Y N = (Y 1,N , Y 2,N , . . . , Y N,N ), we have (4.14) By (4.12), (4.14) and PDE (4.10) evaluated at (s, µ N s ) s∈[0,t] , the expression simplifies to The following proposition states this result rigorously.
. . , d} and that j ∈ {1, . . . , d ′ }. Then, for each T > 0, V ∈ M k ([0, T ] × P 2 (R d )) and the marginal fluctuation at time t ∈ [0, T ] can be expressed as Proof. The statement concerning the regularity of V comes from Theorem 2.15 of [9] (see also Theorem 7.2 in [4] for the special case k = 2 and [31] for a related proof from the perspective of PDE analysis). Equation (4.15) comes from (B.7) of [29].
Lemma 4.12. Suppose that Φ ∈ M 5 (P 2 (R d )) and that b i , σ i,j ∈ M 5 (R d × P 2 (R d )), for i ∈ {1, . . . , d} and j ∈ {1, . . . , d ′ }. Suppose that b and σ are uniformly bounded. Let ν ∈ P 12 (R d ). Then the function V (defined by (4.9)) satisfies Proof. For simplicity of notations, the proof is presented in dimension 1. By (4.10), ∂ t V exists and is given by By Proposition 4.11, V ∈ M 5 ([0, T ] × P 2 (R d )). By part (ii) of Theorem 3.2.3 of [30], By the hypotheses of Lemma 4.12, we can apply Example 3 of Section 5.2.2 of [7] to yield One can easily check that for some finite constant C, with the domination of the integral with respect to µ 1 −µ 2 of the function of x (with other arguments frozen in µ 1 and y 1 ) coming from Lipschitz continuity, Kantorovitch-Rubinstein duality and the inequality W 1 ≤ W 2 . Iterating this argument for higher order derivatives of ∂ t V up to order 3, we deduce that ∂ t V ∈ M 3 ([0, T ] × P 2 (R d )). Lemma 3.2 in [29] states that, for any function f ∈ M 3 (P 2 (R d )), measure m 0 ∈ P 12 (R d ) and m N = 1 N N i=1 δ ζ i , where ζ 1 , . . . , ζ N are i.i.d samples with law m 0 , there exists an absolute constant C > 0 (which does not depend on f , ζ 1 , . . . , ζ N and m 0 ) such that Take any t 1 , t 2 ∈ [0, T ] such that t 1 < t 2 . By (4.16) and Hölder's inequality, there exists an absolute constant C > 0 (which does not depend on V, µ N 0 and ν) such that The following theorem concerns the limiting distribution of F N .
Then, in C(R + , R), the process converges weakly to a Gaussian process L whose finite dimensional distribution (L t 1 , . . . , L t K ), 0 ≤ t 1 ≤ . . . ≤ t K , has a zero expectation vector and covariance matrix Σ given by Proof. Firstly, by (4.15), we decompose F N as By Lemma 4.9 and Corollary 3.3 applied with ℓ = 12, for any t > 0. Consequently, by (4.19) and the fact that V ∈ M 4 ([0, T ] × P 2 (R d )) (which implies boundedness of ∂ 2 µ V by definition) for any T > 0, we deduce that, for any t > 0, It follows by a similar argument that for any t 1 , t 2 ∈ [0, T ], there exists a constant C T > 0 such that By Lemma 6.1 in [4], for some constant C T > 0 that only depends on T . Take any t 1 , t 2 ∈ [0, T ] with t 1 < t 2 . Then, by (4.22) and the Hölder's inequality, By the Burkholder-Davis-Gundy, Jensen's and Hölder's inequalities, the second term of (4.23) can be bounded by T |t 2 − t 1 | 2 , for some constants C (1) T , C (2) T that only depend on T . Repeating the same argument to the first term in (4.23), we observe by (4.22) that This estimate, alone when ν is a Dirac mass so that µ N 0 = ν and Λ N t = I N t , and combined with Lemma 4.12 otherwise, yields  [20]). Next, we compute the weak limit of the finite dimensional distributions of F N . We first define the coupling of (4.8) given by Let θ k be arbitrary real numbers, k ∈ {1, . . . , K}. Then where Z 1 and Z 2 are independent normal random variables given by By (4.26), By the Lévy's continuity theorem, this shows that the random vector (Λ N t 1 , . . . , Λ N t K ) converges weakly to some normal random vector (L t 1 , . . . , L t K ), whose expectation vector is zero and covariance matrix Σ is given by
Let W be defined as W 1 but with |x−y|∧1 replacing the integrand |x−y| in (1.2). By Corollary 6.13 [34], W metricises the topology of weak convergence on P 0 (R d ). Since for all x, y ∈ R d , |x − y| ∧ 1 ≤ |x − y| ℓ , W ≤ W ℓ . Morover, the second inequality in (5.1) and the existence of an optimal coupling ρ between µ and ν imply that Hence if (µ n ) n∈N is a sequence in P ℓ (R d ) such that lim n→∞ W ℓ (µ n , µ) = 0, then µ n converges weakly to µ as n → ∞ and lim n→∞ R d |x| ℓ µ n (dx) = R d |x| ℓ µ(dx). The converse implication can be checked by repeating the proof of the same statement for ℓ ≥ 1 p101-103 [34].