Extended mean field control problem: a propagation of chaos result

In this paper, we study the $extended$ mean field control problem, which is a class of McKean-Vlasov stochastic control problem where the state dynamics and the reward functions depend upon the joint (conditional) distribution of the controlled state and the control process. By considering an appropriate controlled Fokker-Planck equation, we can formulate an optimization problem over a space of measure-valued processes and, under suitable assumptions, prove the equivalence between this optimization problem and the $extended$ mean-field control problem. Moreover, with the help of this new optimization problem, we establish the associated limit theory i.e. the $extended$ mean field control problem is the limit of a large population control problem where the interactions are achieved via the empirical distribution of state and control processes.


Introduction
The aim of this paper is to provide a rigorous connection between two stochastic control problems: the stochastic control problem of large population (or particles) interacting through the empirical distribution of their states and controls on the one hand, and the other hand the problem of control of stochastic dynamics depending upon the joint (conditional) distribution of the controlled state and the control, also called extended mean field control problem.
To fix the ideas, let us briefly described the problems. The large population stochastic control problem can be formulated as follows (see Section 2.1 for more details). Consider N -interacting controlled state processes X := (X 1 , ..., X N ) governed by the following system of stochastic differential equations: [0,t] , ϕ N t , α i t dt + σ t, X i t , ϕ N,X s s∈ [0,t] , ϕ N t , α i t dW i t + σ 0 dB t , t ∈ [0, T ], Here T > 0 is a fixed time horizon, (B, W 1 , ..., W N ) are independent Brownian motions, B is called the common noise and (α 1 , .., α N ) are some admissible controls chosen by a global planner. In this stochastic control problem, the global planner aims to maximise the average reward value given by When N goes to infinity, the expectation is that this problem "converges" towards the extended mean field control problem. Loosely speaking (see Section 2.2 for more details), in the extended mean field control problem the objective is to control via α the state process X which follows the stochastic differential equation of McKean-Vlasov type dX t = b t, X t , (L(X s |B)) s∈ [0,t] , L(X t , α t |B), α t dt + σ t, X t , (L(X s |B)) s∈ [0,t] , L(X t , α t |B), α t dW t + σ 0 dB t , in order to maximise the quantity E T 0 L(t, X t , (L(X s |B)) s∈ [0,t] , L(X t , α t |B), α t )dt + g(X T , (L(X s |B)) s∈ [0,T ] where L(X t , α t |B) (resp L(X t |B)) denote the conditional distribution of the couple (X t , α t ) (resp the state X t ) given the common noise B.
The connection we are investigating, i.e. that the stochastic control problem of large population converges towards the mean field control problem, is often called limit theory or (controlled) propagation of chaos. In contrast with the classical framework of McKean-Vlasov stochastic control problem which only considers the conditional distribution of X t , here, there is in addition the presence of the conditional distribution of (X t , α t ). Indeed, when there is no law of control i.e. no L(X t , α t |B) but only L(X t |B) in (b, σ, L, g), these problems have been studied in the literature. Let us mention the work of Snitzman [33] which shows, for particular coefficients (b, σ) in the absence of control (and the law of control), via some compactness arguments, a connection of this type. See also the papers of Oelschläger [31] and Gärtner [17], with no control and no law of control as well, which use martingale problem in the sense of Stroock and Varadhan [34] adapted in the context of Mckean-Vlasov equation to prove similar results under minimal assumptions.
In the controlled dynamic case but no extended type, that is to say when the dynamic depends on the control but not its law, Fischer and Livieri [15] get a connection between the large population stochastic control problem and the (extended) mean field control problem for the study of a mean-variance problem arising in finance. Another interesting work is that of Budhiraja, Dupuis, and Fischer [4], where they study the behavior of empirical measures of controlled interacting diffusion in order to prove a large deviation principle in a McKean-Vlasov framework. Still without touching the case with law of control, the first papers that deal with the case with control under general assumptions are Lacker [24] and Djete, Possamaï, and Tan [11]. Thanks to an (extension of) martingale problem of [34], as well as relaxed controls initiated by Fleming and Nisio [16], and developed by El Karoui, Huu Nguyen, and Jeanblanc-Picqué [12], combined with compactness arguments adapted to the McKean-Vlasov setting, [24] proves the connection between the two problems under general conditions on (b, σ, L, g) without common noise. Indeed, the idea of using relaxed controls, i.e. control seen as probability measure of type δ αt (du)dt helps to find some compactness properties necessary for proving these types of results. Following upon these ideas, [11] develops a general overview of McKean-Vlasov or mean field control problem, and treats the case with common noise, which turns out to be a non trivial extension.
In the presence of the law of control, this propagation of chaos result is a natural expectation. In spite of appearances, this is not an easy extension. The aforementioned techniques do not work in this context. Two main reasons can explain the unsuitable aspect of the techniques mentioned above. Firstly, the continuity of the application t → L(X t |B) (or t → ϕ N,X t ) plays a crucial role. Indeed, the classical idea is to put this application in a canonical space, which is here the space C([0, T ]; P(R n )) of continuous functions from [0, T ] into the space of probability measures on R n , and via compactness arguments and martingale problem get this connection (see [24], and [11] for the non-Markovian case with common noise). In our situation, this type of continuity is lost because we must take into account the application t → L(X t , α t |B) (or t → ϕ N t ) which does not have this property since the presence of control α can generate some discontinuities. Secondly, as highlighted in [11], proving a result of propagation of chaos is extremely related to the search of the closure of the set of all probabilities that are the image measure of the controlled state process, the control and the conditional distribution of the controlled state process and control, i.e. L X, δ αt (du)dt, L(X, δ αt (du)dt|B) . Unfortunately, the natural space that one might think to answer this question is not a closed set due to another problem of continuity (see Remark 2.5 for a more thorough discussion).
There are not many papers in the literature which study the mean field control problem with law of control and its connection with a large population stochastic control problem. To the best of our knowledge, only the recent papers of Laurière and Tangpi [28] (with strong assumptions) and Motte and Pham [30] (for mean field Markov decision processes) treat the limit theory question. Most papers focus on the questions of existence and uniqueness of optimal control. Acciaio, Backhoff Veraguas, and Carmona [1], with the help of Pontryagin's maximum principle, obtain necessary and sufficient conditions to characterize the optimum with strong assumptions on the coefficients in a no common noise framework. Pham and Wei [32] (without common noise, with closed loop controls) and Djete, Possamaï, and Tan [10] establish the Dynamic Programming Principle (DPP for short) and give a Hamilton-Jacobi equation on a space of probability measures verified by the value function (heuristically proved in [10]). Let us also mention Carmona and Lacker [6], Elie, Mastrolia, and Possamaï [13], Cardaliaguet and Lehalle [5], Alasseur, Taher, and Matoussi [2], Casgrain and Jaimungal [8], Lacker and Soret [26], Féron, Tankov, and Tinsi [14] and [28] who study similar problem in the mean field game framework called mean field game of controls or extended mean field game, as well as our companion paper Djete [9] adapts the arguments of this paper to the context of mean field game of controls.
In this article, our goal is to give some properties on the extended mean field control problem and to show its connection with the large population stochastic control problem under general assumptions on (b, σ, L, g) (see Theorem 3.3 and Theorem 3.1). To bypass the difficulties highlighted above, we follow the idea mentioned in [11] which is to introduce a new optimization problem by considering a suitable set of controls. This set must be the closure of some set of probability measures. In this framework, the appropriate space is the closure of all the probabilities that are the distributions of the conditional distribution of the state controlled process and the conditional distribution of the state controlled process and the control , i.e. L L(X t |B)) t∈ [0,T ] , δ L(Xt,αt|B) (dm)dt (for more details see Section 2.3). Taking into account this type of probability turns out to be the key to solve the main difficulties. The characterization of its closure is possible by the appropriate use of (controlled) Fokker-Planck equation. Inspired by the techniques developed in the proofs of Gyöngy [18], especially [18,Lemma 2.1] (an adaptation of Krylov [22]) and [18,Proposition 4.3] which are regularization results, we can determine the desired set thanks to a Fokker-Planck equation. The conditions used on the coefficients are general, except the non-degeneracy of the volatility σ. This assumption is capital to prove our main results. Apart from this assumption, our result appears to be one of the first to establish some general properties on extended mean field control problem and to show its connection with the large population stochastic problem. Lacker [25] used similar techniques in the context of convergence of closed loop Nash equilibria, but his analysis focuses mainly on an adequate manipulation of [18,Theorem 4.6], while ours focuses on the techniques used for the proofs. Also, let us mention Lacker, Shkolnikov, and Zhang [27] which establish a correspondence between Fokker-Planck equations and solutions of SDE in a McKean-Vlasov framework with common noise.
The rest of the paper is structured as follows. After introducing the notations and the probabilistic structure to give an adequate definition of the tools that are used throughout the paper, Section 2 states all the main assumptions and carefully formulates first the large population stochastic control problem, then the strong formulation of the extended mean field control problem and finally the stochastic control of measure-valued processes. Next, in Section 3, we present the main results of this paper: the equivalence between the strong formulation of extended mean field control problem and the stochastic control of measure-valued processes, and the propagation of chaos result i.e. the extended mean field control problem is, when N goes to infinity, the limit of the large population stochastic control problem in presence of interactions through the empirical distribution of state and control processes. Finally, Section 4 is devoted to the proof of our main results and Section 5 provides some approximation results related to the Fokker-Planck equation needed in our proofs.
Notations. (i) Given a Polish space (E, ∆), p ≥ 1, we denote by P(E) the collection of all Borel probability measures on E, and by P p (E) the subset of Borel probability measures µ such that E ∆(e, e 0 ) p µ(de) < ∞ for some e 0 ∈ E. We equip P p (E) with the Wasserstein metric W p defined by where Λ(µ, µ ′ ) denotes the collection of all probability measures λ on E ×E such that λ(de, E) = µ(de) and λ(E, de ′ ) = µ ′ (de ′ ). Equipped with W p , P p (E) is a Polish space (see [35,Theorem 6.18]). For any µ ∈ P(E) and µ-integrable and for another metric space (E ′ , ∆ ′ ), we denote by µ ⊗ µ ′ ∈ P(E × E ′ ) the product probability of any (µ, µ ′ ) ∈ P(E) × P(E ′ ). Given a probability space (Ω, F , P) supporting a sub-σ-algebra G ⊂ F then for a Polish space E and any random variable ξ : Ω −→ E, both the notations L P (ξ|G)(ω) and P G ω • (ξ) −1 are used to denote the conditional distribution of ξ knowing G under P.
(ii) For any (E, ∆) and (E ′ , ∆ ′ ) two Polish spaces, we shall refer to C b (E, E ′ ) to designate the set of continuous Let N be the set of non-negative integers and N * be the notation of the set of positive integers, i.e. N * := N \ {0}. Given non-negative integers m and n, we denote by S m×n the collection of all m × n-dimensional matrices with real entries, equipped with the standard Euclidean norm, which we denote by | · | regardless of the dimensions, for notational simplicity. We also denote S n := S n×n , and denote by 0 m×n the element in S m×n whose entries are all 0, and by I n the identity matrix in S n . For any matrix a ∈ S n which is symmetric positive semi-definite, we write a 1/2 the unique symmetric positive semi-definite square root of the matrix a. Let k be a positive integer, we denote by C k b (R n ; R) the set of bounded · the uniform norm. When Σ = R k for some k ∈ N, we simply write C k := C([0, T ], R k ), also we shall denote by C k W := C([0, T ], P(R k )), and for p ≥ 1, C k,p W := C([0, T ], P p (R k )). With a Polish space E, we denote by M(E) the space of all Borel measures q(dt, de) on [0, T ] × E, whose marginal distribution on [0, T ] is the Lebesgue measure dt, that is to say q(dt, de) = q(t, de)dt for a family (q(t, de)) t∈[0,T ] of Borel probability measures on E. We also consider the subset M 0 (E) ⊂ M(E) which is the collection of all q ∈ M(E) such that q(dt, de) = δ ψ(t) (de)dt for some Borel measurable function ψ : [0, T ] → E. For any q ∈ M(E), we define q t∧· (ds, de) := q(ds, de) [0,t]×E + δ e0 (de)ds (t,T ]×E , for some fixed e 0 ∈ E. (1.2)

Extended mean field control problem
Let (ℓ, n) ∈ N × N ⋆ , (U, ρ) be a nonempty Polish space and P n U denote the space of all Borel probability measures on R n × U i.e. P n U := P(R n × U ). We give ourselves the following Borel measurable functions Moreover, there exist positive constants C and p such that p ≥ 2 and (i) U is a compact space; (ii) b and σ are continuous bounded functions, and σ 0 ∈ S n×ℓ is constant; (iv) for some constant θ > 0, one has, for all (t, x, π, m, u) x, π, m, u); (v) the reward functions L and g are continuous, and for all (t, x, π, m, u) ∈ [0, T ] × R n × C n W × P n U × U , one has Remark 2.2. These assumptions are standard and in the same spirit as those used in [24] and [11], but with some specific modifications adapted to the context of this article. They ensure the well-posedness of the objects used throughout this paper. Due to the technical aspect of our paper, the point (i) is considered essentially to simplify (the presentation of) the proofs. But, using the classical uniform integrability condition as in [24] and [11], it is possible to work with U a non-bounded set of R n for instance. The point (iv) is the least classical assumption in the study in this problem. This is an important assumption for the proofs of our results, in particular to deal with the Fokker-Planck equations and the different SDEs considered in the proofs (see Section 5).

The large population stochastic control problem
In this section, we present the N -agent stochastic control problem or large population control problem. The study of this control problem when N goes to infinity is one of the main objective of this paper.
For a fixed (ν 1 , . . . , ν N ) ∈ P p (R n ) N , let Ω N := (R n ) N × (C n ) N × C ℓ be the canonical space, with canonical variable X 0 = (X 1 0 , . . . , X N 0 ), canonical processes W = (W 1 s , . . . , W N s ) 0≤s≤T and B = (B s ) 0≤s≤T , and probability measure P N ν under which X 0 ∼ ν N := ν 1 ⊗ · · · ⊗ ν N and (W, B) are standard Brownian motions independent of X 0 . Let F N = (F N s ) 0≤s≤T be defined by Let us denote by A N (ν N ) the collection of all U -valued F N -predictable processes. Then, given α : ) the unique strong solution of the following system of SDEs, for each which is well-posed under Assumption 2.1.

Remark 2.3. (i)
Our formulation allows for coefficients depending on the path of the empirical distribution of X α , but can only accommodate a Markovian dependence with respect to X α itself. In some sense, we work on a non-Markovian framework w.r.t. the empirical distribution of X α . Indeed, as we will see in Section 2.3, our point of view is to write the entire problem as an optimization involving mainly the empirical distribution of X α i.e. ϕ N,X . Therefore our key variable is ϕ N,X (not X α ) and we can deal with its path, hence the non-Markovian aspect.
(ii) Sometimes, the probability on C n will be used to refer to (α 1 , . . . , α N ) ∈ (A N (ν N )) N . The notation P N S (ν 1 , . . . , ν N ) will designate all probabilities of this type. The need for this space will become clearer in the following.

The extended mean field control problem
On a fixed probability space, we formulate the classical McKean-Vlasov control problem with common noise including the (conditional) law of control.
For a fixed ν ∈ P p (R n ), let Ω := R n × C n × C ℓ be the canonical space, with canonical variable ξ, canonical processes W = (W t ) 0≤t≤T and B = (B t ) 0≤t≤T , and probability measure P ν under which ξ ∼ ν and (W, B) are standard Brownian motions independent of ξ. Let F = (F s ) 0≤s≤T and G = (G s ) 0≤s≤T be defined by: for all s ∈ [0, T ], Let us denote by A(ν) the collection of all U -valued processes α = (α s ) 0≤s≤T which are F-predictable. Then, given α ∈ A(ν), let X α be the unique strong solution of the SDE (see [10,Theorem A.3]): E Pν X α p < ∞, X α 0 = ξ, and for t ∈ [0, T ], with µ α r := L Pν X α r G r and µ α r := L Pν X α r , α r G r , for all r ∈ [0, T ]. Let us now introduce the following McKean-Vlasov control problem by Remark 2.4. Similarly to [11], notice that, this formulation takes into account the case without common noise. Indeed, when ℓ = 0, the space C ℓ and S n×ℓ degenerate and become {0}. Then, B = 0 and, the filtration G is constant equal to the trivial σ-algebra {∅, Ω}. Therefore, there is no conditional distribution anymore.
Remark 2.5 (Discussion on a possible relaxed extended mean field control problem). An adequate way to study the properties of V S and/or to give a limit theory is to find the closure S(ν) of some particular space S(ν) for the Wasserstein topology. To simplify, let us take ℓ = 0 (without common noise), according to the classical ideas of relaxed controls, S(ν) := P ν • X α , δ αt (du)dt −1 , α ∈ A(ν) (see discussion Djete, Possamaï, and Tan [11] and also Lacker [24]).
Following [24] and [11], let us give an example to see why the "natural" expected relaxed controls is not a "good" set.
But, P R (ν) defined in this way is not a closed set. Indeed the map q ∈ M(U ) → q t ∈ P(U ) is not continuous for the Wasserstein topology. Therefore P R (ν) can not be the closure of S(ν). More generally, as long as the coefficients (b, σ) are non-linear w.r.t m, this kind discontinuity will appear. Due to this type of lack of continuity, this approach cannot work. We need then to change the framework.

Stochastic control of measure-valued processes
As previously mentioned, the classical approach of relaxed controls is not appropriate. To bypass the difficulty generated by the (conditional) distribution of control in this study, especially to prove the limit theory result or (controlled) propagation of chaos, we introduce a new stochastic control problem. Motivated by the Fokker-Planck equation verified by the couple (µ α , µ α ) from (2.4), we give in this part an equivalent formulation of the extended mean field control problem which is less "rigid".

Measure-valued rules
Recall that M := M P n U denotes the collection of all finite (Borel) measures q(dt, dm) on [0, T ] × P n U , whose marginal distribution on [0, T ] is the Lebesgue measure ds, i.e. q(ds, dm) = q(s, dm)ds for a measurable family (q(s, dm)) s∈[0,T ] of Borel probability measures on P n U . Let Λ be the canonical element on M. We then introduce a canonical filtration . For each q ∈ M, one has a disintegration property: q(dt, dm) = q(t, dm)dt, and there is a version of disintegration such that (t, q) → q(t, dm) is F Λ -predictable.
Let us consider L the following generator: for all (t, x, π, m, u) ∈ [0, T ] × R n × C n W × P n U × U and any ϕ ∈ C 2 (R n ) also we introduce, for every f ∈ C 2 (R n ), N t (f ): recall that ·, · is defined in (1.1). Notice that, under Assumption 2.1, the integral in the definition N (f ) is wellposedness. For each π ∈ P(R n ), one considers the Borel set Z π which is the set of probability measures m on R n × U with marginal on R n equal to π i.e.
Definition 2.6. For every ν ∈ P(R n ), P ∈ P(Ω) is a measure-valued rule if: is a (P, F) Wiener process starting at zero and for P-almost every ω ∈ Ω, We shall denote by P V (ν) the set of all measure-valued rules with initial value ν.

Optimization problem
Let us define, for all (π, q) ∈ C n W × M(P n U ), Notice that, under Assumption 2.1, the map J : C n,p W × M p (P n U ) → R is continuous (see for instance Proposition A.4). We can now define the measure-valued control problem: for each ν ∈ P(R n ), where for each m ∈ P n U , the Borel measurable function R n ∋ x → m x ∈ P(U ) verifies m x (du)m(dx, U ) = m(dx, du). This kind of control turns out to be less "rigid". Especially, P V (ν) is a compact set for the Wasserstein topology (see Theorem 3.1).
(ii) Working with these variables seems to be the key to better understand the problem and solves the principal difficulties. Mainly, to prove a limit theory result in this context, we make an approximation of the distribution of (µ, Λ) thanks to the distribution of variables of type (µ α , δ µ α t (dm)dt) and not thanks to the approximation of the law of X. This approximation is achieved by using Fokker-Planck equations. To the best of our knowledge, looking at this kind of variable or "control" has never been studied in the literature (except in [11], only for technical reasons).

SDE formulation of measure-valued rules
Instead of presenting what we call measure-valued rules as solutions of Fokker-Planck equation, it is possible to formulate the measure-valued rules through solution of SDEs. Indeed, using an equivalence between Fokker-Planck equations and SDEs, there is an alternative way to formulate the measurevalued rules. In order to give more insights about the measure-valued rules, let us describe the SDEs formulation. For this purpose, we introduce the notion of extended relaxed control rules. We say that the tuple (Ω, F , F, P, W, B, X, µ, Λ) is an extended relaxed control rule if is a R n × P(R n )-valued F-adapted continuous process and Λ is a P(P n U )-valued F-predictable process. (ii) X 0 , W and (µ, Λ, B) are independent.
Λ t (Z µt ) = 1 dP ⊗ dt a.e. and the process X is solution of: L P (X 0 ) = ν and where for each m ∈ P n U , the Borel measurable function Using [27,Theorem 1.3.] or an easy adaptation of Proposition 5.8 or Proposition 5.9, we have the following equivalence result. (ii) Conversely, for any P ∈ P V (ν) measure-valued rule, there exists an extended relaxed control rule (Ω, F , F, P, W, B, X, µ, Λ) s.t.
As stated in the preamble of this part, the measure-valued control problem is motivated by the Fokker-Planck equation verified by the couple (µ α , µ α ) of the strong formulation. Therefore, the strong controls i.e. (µ α , µ α ) α∈A(ν) can be seen as a special case of measure-valued rules. By taking into account the previous equivalence Proposition or by applying Itô's formula, it is straightforward to deduce the following proposition. Proposition 2.9. For each ν ∈ P p (R n ), let us introduce one has P S (ν) ⊂ P V (ν) and . After applying Itô's formula with the process X α · − σ 0 B · , and taking the conditional expectation w.
In addition, notice that

Main results
Now, we formulate the main results of this paper.
is Borel measurable and one gets lim Remark 3.2. (i) As in [11] (see also [23] and [9] for the mean field game context), there are some specificities when ℓ = 0. Indeed, when ℓ = 0, (µ α , µ α ) are deterministic, but (µ, Λ) can still be random, therefore, except in particular situation, it is not possible to approximate the non atomic measure P by a sequence of atomic measure of type δ (µ α ,δ µ α s (dm)ds) . However, a randomisation is possible as mentioned in (ii) of Theorem 3.1.
(ii) Theorem 3.1 and the following Theorem 3.3 are in the same spirit that Theorem 3.1 and Theorem 3.6 of [11].
The main difference is the presence of the distribution of controlled state and control, and this particularity turns out to be a non trivial extension (see discussion in Section 2.2).

Theorem 3.3 (Propagation of chaos). Let
Finally, we provide some properties of optimal control of our problem. For any ν ∈ P(R n ), denote by P ⋆ V (ν) the set of optimal control i.e. P ⋆ ∈ P (ii) To the best of our knowledge, Theorem 3.3 and Proposition 3.4 seem to be the first result under these general assumptions to provide these types of convergence results. As mentioned in the introduction, other authors treat these questions but in a particular framework. For instance, while dealing with the convergence of Nash equilibria, [28] gives a limit theory result for the extended mean field control problem. The framework of [28] is less general than ours, in particular, they consider a situation without common noise (σ 0 = 0), with volatility σ constant. Besides, they need assumptions over (b, g, L) via the Hamiltonian which lead to the uniqueness of the optimum and, these assumptions are sometimes quite difficult to verify in practice. However, it should be mentioned that the results of [28] include a rate of convergence that we do not provide. Let us also mention [30] which treats these questions of convergence but for Markov decision processes in discrete time.

Proposition 3.4. Suppose that the conditions of Theorem 3.3 hold. Let lim
The next corollary is just a combination of Theorem 3.3 and [11,Proposition 4.15]. It states that if a strong control is close enough to the optimum value of the mean field control problem, from this control, we can construct N agents which are close to the optimum of the large population stochastic control problem.

Proofs of the main results
In this part, we will present the proof of the main results of this paper namely Theorem 3.1 and Theorem 3.3. Some proofs use the results from Section 5 which will be proven just after.

Equivalence result
This section is devoted to the proof of Theorem 3.1. To achieve this proof, we provide an approximation of measurevalued rule by McKean-Vlasov processes. Before starting the proofs, by shifting some probabilities, let us give a reformulation of measure-valued rules. For and any q ∈ M, In the same way, let us consider the "shif ted" generator L, The next result follows immediately, so we omit the proof.
× Ω, and P-a.e.ω ∈ Ω, for all Next, let us provide some estimates for the different controls. The first result is standard, the second is just an application of Proposition 5.2 (see also Remark 5.4) combined with Lemma 4.1.

Lemma 4.2 (Estimates). Under Assumption 2.1, for any
In addition where ϑ is the process given in equation (4.4).

Technical lemmas
In this part, from a measure-valued rule, we will build a sequence of processes that approximate the measure-valued rule and that are close enough to strong control rules. This part is the fundamental part for the proof of Theorem 3.1.
let ν ∈ P p ′ (R n ), P ∈ P V (ν), and ( Ω, F, F , P) be a filtered probability space supporting W R n -valued F-Brownian motion and let ξ be a F 0 -random variable s.t. L P (ξ) = ν. We define the filtered probability space ( Ω, F, F , P) which is an extension of the canonical space (Ω, F, P): Ω := Ω × Ω, F := ( F t ⊗ F t ) t∈[0,T ] and P := P ⊗ P. The variables (ξ, W ) of Ω and (B, µ, Λ) of Ω are naturally extended on the space Ω while keeping the same notation (ξ, W, B, µ, Λ) for simplicity. Also, let us consider the filtration ( G t ) t∈[0,T ] defined by Proof. As P ∈ P V (ν), by definition, P a.e. ω ∈ Ω, N t (f ) = 0 for all f ∈ C 2 b (R n ) and t ∈ [0, T ]. By Lemma 4.1, by taking into account the extension of all variables on Ω, recall that In the same spirit of notations (4.3), we introduce , ·, ·, ·) verify the Assumption 2.1 with constant C and θ independent of b (see Assumption 2.1). Now, let us apply Proposition 5.8 (see also Proposition 5.6). As ϑ, δ h k (t,Θt∧·) (dm)dt, B k∈N * is P independent of (ξ, W ) and where Using equation (4.6), we rewrite X k by Denote by X k := X k + σ 0 B, one finds With the notation introduced in (4.1) and (4.2), it is straightforward to check that the map After extraction from ( X kj , α kj ) j∈N * , one has also the P-a.e. convergence (4.5).

Proof of Theorem 3.1
First, for ν ∈ P p ′ (R n ), under Assumption 2.1, let us prove that P V (ν) is a compact set for the Wasserstein topology W p . Let (P k ) k∈N * ⊂ P V (ν), by Proposition 4.4, (P k ) k∈N * is relatively compact for the Wassertein topology W p and any limit P ∞ of any sub-sequence belongs to P V (ν). Therefore P V (ν) is compact. By similar techniques used in [11,Theorem 3.1], it is straightforward to show that P V (ν) is convex.
Next, we prove the items (i) and (ii) of Theorem 3.1. By applying Proposition 4.3, with the same notations, for any [0, 1]-valued uniform variable Z P-independent of (ξ, W, B, µ, Λ), there exists a sequence of F-predictable processes (α k ) k∈N * satisfying: for each k ∈ N * , where µ k and (B, µ k ) are P-independent of (ξ, W ). For all k ∈ N * , denote then Q k is a weak control according to [11,Definition 2.9]. Then by (a slight extension of) [11, Proposition 4.5], (1) when ℓ = 0, there exists α j,k ∈ A(ν), and X α j,k the strong solution of (2.4) with control α j,k such that (2) When ℓ = 0, there exists a family of Borel All these results are enough to deduce the items (i) and (ii) of Theorem 3.1, and conclude that:

Propagation of chaos
With the help of Theorem 3.1, in this section we provide one of the main objective of this paper, which is to prove the limit theory result or (controlled) propagation of chaos.

Technical results: study of the behavior of processes when N goes to infinity
In this part, the properties of some sequences of probability measures on the canonical space Ω are given. Mainly, the behavior when N goes to infinity of sequences of type (P(α 1 , ..., α N )) N ∈N * construct from the formulation of large population stochastic control problem are studied. (see Section 2.1 and Remark 2.3).
for the metric W p and for every P ∞ ∈ P(Ω) the limit of any sub-sequence (ii) Let us consider the sequence (P k ) k∈N * of probability measures such that P k ∈ P V (ν k ) for each k ∈ N * . If for the metric W p and for every P ∞ ∈ P(Ω) the limit of any sub-sequence (P kj ) j∈N * , P ∞ ∈ P V lim j→∞ ν kj .
Proof. (i) Thanks to Proposition A.2 or/and Proposition-B.1 of [7], as U is compact, it is easy to check that (P N ) N ∈N * is pre-compact on P p (Ω) for the metric W p . Let P ∞ be a limit of a sub-sequence (P Nj ) j∈N * . For sake of simplicity, we denote (P Nj ) j∈N * = (P N ) N ∈N * and ν := lim j (2.6)). Notice that the function (t, b, π, q continuous (see for instance Proposition A.4), one finds that by taking h under a countable set of C b (R n ), one concludes Λ t Z µt = 1 P ∞ ⊗ dt a.e. . It is obvious that (B t ) t∈[0,T ] is a (P ∞ , F) Wiener process. Let Q ∈ N * , and (h q ) q∈{1,..,Q} : R n → R Q be bounded functions, one has Let us show this result when Q = 2, when Q ∈ N * , the proof is similar.
(ii) For the second part of this proposition, notice that, thanks to Lemma 4.2, where τ is a [0, T ]-valued F-stopping time, and recall that (ϑ) t∈[0,T ] is the P(R n )-valued F-adapted continuous process defined in equation (4.4). Then by Aldous' criterion [20,Lemma 16.12] (see also proof of [7, Proposition- k∈N * is relatively compact for the metric W p . Then, using the fact that P k ∈ P V (ν k ) for each k ∈ N * and the relation between (ϑ, Θ) and the canonical processes (µ, Λ) (see equation (4.4)), we deduce that The rest of the proof is similar to the previous proof.
In particular, the map V S : Proof. By Theorem 3.1, one has V S (ν) = V V (ν), thanks to this result, the proof is similar to the proof of [11,Proposition 3.7.]. Let (δ k ) k∈N * ⊂ N * with lim k→∞ δ k = 0 and (P k ) k∈N * be a sequence such that P k ∈ P V (ν k ) and . By Proposition 4.4, (P k ) k∈N is relatively compact on (P p (Ω), W p ) and if P ∈ P(Ω) is the limit of a sub-sequence (P kj ) j∈N * then P ∈ P V (ν). Using Assumption 2.1, by convergence of (P kj ) j∈N * , one has By [11,Proposition 4.15], where (ε N ) N ∈N * is sequence with lim N →∞ ε N = 0, then (P N ) N ∈N * is relatively compact on (P p (Ω), W p ) and for every P ∞ ∈ P(Ω) the limit of the sub-sequence (P Nj ) j∈N * , P ∞ ∈ P V lim j→∞ Then, as lim j→∞ (ii) Let (N j ) j∈N be the sequence corresponding to : and converges in (P p (R n ), W p ), by Proposition 4.5, this is enough to conclude the proof.

Proof of Proposition 3.4
Notice that, for ν ∈ P p ′ (R n ), by Theorem 3.1, P ⋆ V (ν) is nonempty. Let us define the distance function to the set P ⋆ V (ν), for each Q ∈ P(Ω), It is well know that, as P ⋆ V (ν) is nonempty, the function Ψ ⋆ : Q ∈ P p (Ω) → R is continuous. Then by Proposition 4.4, (P N ) N ∈N * is precompact in P p (Ω) for the metric W p and if P ∈ P(Ω) is the limit of a sub-sequence (P Nj ) j∈N * , one have P ∈ P V (ν). Under Assumption 2.1, lim j→∞ E P N j [J(µ, Λ)] = E P [J(µ, Λ)]. Combining Theorem 3.3 and Proposition 4.5, one has that then P ∈ P ⋆ V (ν). Hence each limit of any sub-sequence of (P N ) N ∈N * belongs to P

Approximation of Fokker-Planck equations
In this section, we give an approximation of a particular Fokker-Planck equation via a sequence of measure-valued processes constructed from classical SDE processes interacting through the empirical distribution of their states and controls. This result is a crucial part for the proof of Theorem 3.1 and Theorem 3.3.

Main ideas leading the proof
Because of the technical aspect of this part, before going into details, let us first explain in a simple situation the main goal of this part and the ideas for the proof. As we said earlier, from a Fokker-Planck equation satisfied by a measure-valued solution P (see Definition 2.6), we want to construct a sequence of "weak" McKean-Vlasov processes s.t. the limit, in a certain sense, of this sequence will be P. Let us be more precise. For simplification, we assume that Using the SDEs formulation, on an extension ( Ω, F, P) of (Ω, F, P), we can find X satisfying where W is a F-Brownian motion, ξ a F 0 -random variable s.t. L(ξ) = ν and (W, ξ) is independent of G T . The process (Λ t ) t∈[0,T ] can be seen as a control of the process X or µ. The goal is to construct a sequence of F-predictable processes If it was possible for Equation (5.1) or Equation (5.2) to satisfied an appropriate uniqueness result (in law), this kind of approximation would become much simpler to perform. Unfortunately, for a general Λ, a uniqueness result can not be expected for this type of equation. Therefore, find the sequence (α k ) k∈N * becomes a challenging problem.

Strategy of proof: 1-regularization
This part is realized in Section 5.2. The main idea here is to regularize Equation (5.1) or Equation (5.2) in order to recover some uniqueness result. Indeed, in Section 5.2, we show that: X ε solution of Notice that, now, when Λ is given, Equation ( one has, when ε > 0 is fixed, by passing to the limit in Equation (5.4) and using uniqueness of Equation (5.3), we find that lim k→∞ µ ε,k = µ ε a.e. Consequently, we can set k and ε as fixed, and focus on the approximation of Equation (5.4) or equivalently of

Strategy of proof: 2-construction of control and discretization
.
Let us assume that it is possible to construct a Borel function α ε,k : [0, T ] × U × R n → U, a R n -valued F-adapted continuous process X ε,k and a [0, 1]-valued F-predictable process F satisfying: F t and X ε,k t are conditionally independent given G t , Notice that, by uniqueness of Equation (5.4), L( X ε,k t |G t ) = µ ε,k t a.e. for all t ∈ [0, T ]. Given (α ε,k , X ε,k , F ), our last sequence is then given by: By using some technical results, proving in Proposition A.2 and Corollary A.3, we deduce that The fact is we are not able to construct the tuple (α ε,k , X ε,k , F ) as presented below. This construction will be done through approximation by discretization in time in Section 5.3. Moreover, the framework that we will consider in the next part will be more general than the presentation we have chosen for the main results. The reason is that the techniques we use can be applied to both mean field game and mean field control problem (see our companion paper [9]). Therefore, we made the choice to have a presentation that allows the results to be used in both contexts.

Regularization of the Fokker-Planck equation
In this part, with the help of a regularization by convolution, we show that it can be possible to approximate a particular solution of a Fokker-Planck equation with "non-smooth" coefficients by a sequence of solutions of Fokker-Planck equations with "smooth" coefficients, this part is largely inspired by the proof of [18, Lemma 2.1].
Let b ∈ C ℓ , (n t ) t∈[0,T ] and (z t ) t∈[0,T ] belong to C n W and alsoq t (dm, dm ′ )dt ∈ M((P n U ) 2 ). Moreover, (n, z,q, b) satisfy the following equation: n 0 = ν and with (b,σ) : [0, T ] × R n × C ℓ × (C n W ) 2 × (P n U ) 2 × U → R n × S n is bounded and continuous function in all arguments, and for eachν ∈ P n U , the map (b,σ)(·, ·, b, ·, z, ·,ν, ·) satisfies Assumption 2.1 with constant θ independent ofν. Remark 5.1. As said in the end of Section 5.1, we consider this type of general Fokker-Planck equation because we want to have a formulation useful both in mean field game and mean field control. Here, the mean field game aspect appears in the integration over dν inq and z. The integration over dν inq and z play the role of fixed measures as it can happen in mean field game.

Proposition 5.2 (Regularization of Fokker-Planck equation).
Let ν ∈ P p (R n ), for each ε > 0, there exists a unique solution (n ε t ) t∈[0,T ] ∈ C n,p W of: n ε 0 = ν and for all f ∈ C 1,2 b ([0, T ] × R n ) and one has, by uniqueness of (5.8), L P (Y ε t ) = n ε t for all t ∈ [0, T ] where n ε is the solution of (5.8). (ii) We will sometimes use the previous lemma with Proposition A.2, in which n ε must be obtainable through a diffusion process that has a volatility term which verifiesâ ε [b, n, z,q r ](t, Y ε t ) ≥ θI n×n . The SDE (5.10) allows to say that n ε satisfies these conditions. Also, from Proposition 5.2 and the SDE representation (5.10), it is straightforward to see that the measure n t (dx)dt is equivalent to the Lebesgue measure on R n × [0, T ] (see for instance Proposition A.1 ).

Approximation by N-agents
Now, let us formulate the approximation result of Fokker-Planck equation by N -interacting SDE equations. In order to achieve this, we first describe the associated framework.

(ii) This type of Fokker-Planck equation appears especially in the study of optimal control of McKean-Vlasov equation
(see Section 4 above) and mean field game (see [9]). One the most important variable is Λ. It can play the role of control in optimal control of McKean-Vlasov equation, but also of external parameter as it is the case in the mean field game.
Let ( Ω, F , F, P) be another filtered probability space supporting: • (W i ) i∈N * a sequence of R n -valued independent F-Brownian motions and (ξ i ) i∈N * a sequence of independent F 0 -random variables s.t. L P (ξ i ) = ν i ∈ P p ′ (R n ), • (µ N ) N ∈N * and (ζ N ) N ∈N * two sequences of P(R n )-valued F-adapted continuous processes, and (B N ) N ∈N * a sequence of R ℓ -valued F-adapted continuous processes, • (m N ) N ∈N * and (ν N ) N ∈N * two sequences of P n U -valued F-predictable processes, satisfying: 14) The next proposition describes an approximation by a sequence of N -interacting processes of the Fokker-Planck equation (5.11). (iv) The presence of the map φ, notably in (5.13), specifies the condition needed on µ for the result. In particular, if φ is null, it means that no assumption of convergence towards µ is necessary to find a sequence of SDE processes converging to µ.

,T ] be the continuous processes unique strong solution of: for each
Proof of Proposition 5.6. The proof is divided in three steps for a better understanding.
for all x ∈ R n . By Blackwell and Dubins [3], there exists a Borel application for all (x, m) ∈ R n × P n U and any [0, 1]-valued uniform random variable F, Step 2.1 : Construction of scheme of discretization: Let us consider the partition (t N k ) 1≤k≤2 N with t N k = kT 2 N , and take a sequence of R n -valued independent Brownian motions (Z i ) i∈N * , independent of all of other variables. Let , and given ε > 0, we define on ( Ω, F, F , P), by Euler scheme, X ε,i,N := X i as follows: X i 0 := ξ i and Step 2.2 : Compactness and identif ication of the limit: At this stage, we want to show a compactness result and identify the limit of a certain sequence of probability measures constructed from the SDE process (X 1 , ..., X N ). Using the assumptions imposed on coefficients (b,σ) (see the definition of the generator A in (5.12)), especially the fact thatσσ ⊤ ≥ θI n and (b,σ) are bounded, one has that [ B, Σ] are bounded and there exists a constant D > 0 such that for all ε and N sup i∈{1,...,N } Moreover, by using the fact that sup N ≥1 Let us identify the limit of any convergent sub-sequence of (P N ) N ∈N * . For sake of clarity, we use the notation X i instead of X ε,i,N . Recall that for the time being ε > 0 is considered as fixed.
Indeed, by using the fact that: for all (x, m, e) ∈ R n × P n and (V i s , V j s ) are independent and independent of other variables, one has By similar way, if we denote by Σ i,N s By simple calculations, consequently, there exists a constant C > 0 (independent of N ) such that By successively applying the results (5.23) and (5.24), and inequality (5.22), one gets a constant M > 0 depending on (f, b, σ) (which changes from line to line) s.t.
Remark that as ∇f and Σ are bounded, Thanks to inequality (5.22), it is straightforward to verify that Let P ∞ ∈ P C n W × C n W × C n W × M (P n U ) 2 × C ℓ be the limit of any sub-sequence (P N k ) k∈N * of (P N ) N ∈N * , and denote by (β ϑ , β µ , β ζ , β, B) the canonical process on C n W × C n W × C n W × M (P n U ) 2 × C ℓ . By combining inequalities (5.25) and (5.26) with the result (5.27), by passing to the limit, using continuity of coefficients, given ε > 0: for all Therefore, after taking a countable family of (f, t), one gets: for all (t, f ) from this equality, we can show the previous equality holds true for all f ∈ C 2 b (R n ). For each ε > 0, by uniqueness β ϑ := Φ ε B, β µ , β ζ , β with Φ ε : C ℓ × C n W × C n W × M (P n U ) 2 → C n W a Borel function used in (5.18). Notice that, by assumptions (5.13), This result is enough to deduce that P ∞ = Q • µ ε , φ(µ), ζ, Λ, B −1 . This is true for any limit P ∞ for any sub-sequence of (P N ) N ∈N * , therefore then by Gronwall lemma Firstly, thanks to results (5.28) and the approximation realized in (5.17), one gets Secondly, after calculations, it is straightforward to deduce that By regularity of coefficients (Assumption 2 .1 and (b,σ) bounded), the results (5.28) and (5.27) allow to get Next, let us define the variable It is easy to check that the sequence (Υ N ) N ∈N * is relatively compact for the Wasserstein metric W p . Denote by Υ ∞ the limit of a sub-sequence ( We prove this equality when Q = 2, the case Q ∈ N * follows immediately. Indeed, where the fourth equality is true because of the same argument used in (5.23) and (5.24), i.e. for all (s, v) , and for i = j (V i s , V j s ) are independent and independent of other variables, and the last equality follows from (5.28) and (5.27), and the terms starting with this is true for any limit Υ ∞ of any sub-sequence. Therefore, the sequence (Υ N ) N ∈N * converges towards Υ for the wasserstein metric W p . Then, to finish, by Corollary A.3, All these results allow to deduce that lim All previous result combined with measurability property (5.21) allowed to say (α 1 , ..., α N ) and ( X 1 , ..., X N ) are the controls and the processes we are looking for.
In fact, in Proposition 5.6, instead of interaction processes of type (5.15), it is possible to use a sequence of weak McKean-Vlasov processes and obtain similar result. Let us assume conditions and inputs previously mentioned for Proposition 5.6 are satisfied. Let W be a ( P, F)-Brownian motion, ξ be a F 0 -random variable with L P (ξ) = ν, and Z be a uniform variable independent of (ξ, W ). In addition, G will play the role of the common noise filtration. We now provide approximations by weak McKean-Vlasov processes. The proofs of the next Proposition 5.8 and Proposition 5.9 are left in Appendix A.1.
Another useful approximation Using roughly the same arguments as those used in the proof of the Proposition 5.6, another approximation result can be provided. This can be seen as another version of Proposition 5.8 where the sequence (Λ N ) N ∈N * is not necessarily a subset of M 0 (P n U ) 2 and the controls that achieve the approximation are probability measures. where In other words, when the variables (m,ν) of (b,σσ ⊤ ) are "separated", we just need separated condition on (Λ N ) N ∈N * of type (5.35), i.e. Λ N "separated".

By equation (A.2), one has
where n ε,δ t := L P (X ε,δ,0,ξ ε t ) for t ∈ [0, T ], with L P (ξ ε )(dy) = ν (ε) (y)dy. Combining the previous equality, Consequently, for each ε > 0, Proof of Proposition 5.8. Before starting, let us mention that many parts of this proof use Proposition 5.6 and its associated proof. Let us take the sequence of processes (α i,N ) (i,N )∈N * ×N * given in Proposition 5.6 with L P (ξ i ) = ν i = ν for each i, and define the unique strong solution X i,N of: Let X N be the unique strong solution of equation (5.32) (associated to α N ). By independence condition in Assumption (5.31), recall that m N is given in equation (5.32), P-a.e., and given the σ-field G N t , for i = j, (X i,N t∧· , α i,N t ) are independent of (X j,N t∧· , α j,N t ) (A. 6) and Let us introduce for each N ∈ N * , the measure on [0, T ] × P(C n × U ) × P(C n × U ) As (b,σ) are bounded and ν ∈ P p ′ (R n ), it is straightforward to check that sup N ≥1 sup i∈{1,...,N } E P sup t∈[0,T ] X i,N t p ′ < ∞, and hence (Γ N ) N ∈N * is relatively compact for the Wasserstein metric W p . Denote by Γ ∞ the limit of a sub-sequence of (Γ N ) N ∈N * . For simplicity, we will use the same notation for the sequence and the sub-sequence. One gets It is enough to show that: for all Q ∈ N * , any bounded functions (f q ) d∈{1,...,Q} : C n × U → R Q and g : f q , e g(t, e)Γ ∞ t de, P(C n × U ) dt.
Let us prove this result when Q = 2, the case Q ∈ N * is true by similar way.
where we used result (A.6) and the fact that the terms starting with 1 Next, for all t ∈ [0, T ], using Lipshitz property, there exists a constant C > 0 (which changes from line to line) recall that ( X 1 , ..., X N ) are defined in equation (5.15) (in Proposition 5.6), and m N t Then by Gronwall Lemma As, therefore, by taking the sub-sequence corresponding to the lim sup, by result (A.7), From all previous results, it is straightforward to check that . Consequently, by Proposition 5.6 Proof of Proposition 5.9. The proof of this Proposition is exactly the same as Proposition 5.6, we essentially recall the main step.

A.2 Regularization by convolution and consequence
This part presents results about the approximation of Borel measurable functions through a sequence of "smooth" functions. The main point is that this approximation is achieved via a convolution. The convolution is realized by a probability measure constructed by an SDE process. Before presenting the main results, we start by recalling an equivalence result coming from [18,Proposition 4.2].
Let (Ω, F, F , P) be a filtered probability space supporting W a R n -valued F-Brownian motion and ξ a F 0 -random variable verifying E P [|ξ| p ] < ∞, (b t , σ t ) t∈[0,T ] R n × S n bounded predictable process such that there exists θ > 0 satisfying [σ t ][σ t ] ⊤ ≥ θI n×n . For all t ∈ [0, T ], denote by Approximation by convolution We set G ∈ C ∞ (R n ; R) with compact support satisfying G ≥ 0, G(x) = G(−x) for x ∈ R n , and R n G(y)dy = 1. Let us introduce G ε (x) := ε −n G(ε −1 x) for all x ∈ R n . Let X k be the process defined by where there exists D > 0 s.t. for all k and t, |σ k t | + |b k t | ≤ D, P-a.e., [σ k t ][σ k t ] ⊤ ≥ θI n×n , P-a.e. In addition E P [|ξ| p ] < ∞ where p ≥ 1. Also, we take (n t ) t∈[0,T ] ∈ C n W such that n t (dx)dt is equivalent to the Lebesgue measure on [0, T ] × R n , and for the weak topology, lim k→∞ L P (X k t ) = n t for each t ∈ [0, T ].
Proof. Mention that, as n t (dx)dt is equivalent to the Lebesgue measure on [0, T ] × R n , there exists Borel measurable function c : [0, T ] × R n → R such that c(s, z) > 0 dt ⊗ dx a.e. (s, z) ∈ [0, T ] × R n , and n t (dx)dt = c(t, x)dxdt. First, let us prove the result (A.9). If x, x) n t (dx)dt, one finds that where the first inequality is true because ϕ is bounded and the last result is obtained by the classical result of approximation by convolution.
For the second point, let k 0 ∈ N * , one has that n t (dy) − R n ϕ(t, X k t , y) G k0 (t, X k t − y) (n t ) (k0) (X k t ) n t (dy) dt n t (dy) − R n ϕ(t, x, x)n t (dx) dt.
for any (t, π,q), the application x ∈ R n → a(t, x, π t∧· ,q t∧· ) 1/2 ∈ S n×n is Lipshitz, with a Lipschitz constant depends only on a. Therefore, there exists the R n -valued F-adapted process X unique strong solution of X s = ξ + s 0 b(r, X r , µ,Λ)dr + s 0 a(r, X r , µ,Λ) 1/2 dW r for all s ∈ [0, T ].