Measurability of optimal transportation and strong coupling of martingale measures

We consider the optimal mass transportation problem in $\RR^d$ with measurably parameterized marginals, for general cost functions and under conditions ensuring the existence of a unique optimal transport map. We prove a joint measurability result for this map, with respect to the space variable and to the parameter. The proof needs to establish the measurability of some set-valued mappings, related to the support of the optimal transference plans, which we use to perform a suitable discrete approximation procedure. A motivation is the construction of a strong coupling between orthogonal martingale measures. By this we mean that, given a martingale measure, we construct in the same probability space a second one with specified covariance measure. This is done by pushing forward one martingale measure through a predictable version of the optimal transport map between the covariance measures. This coupling allows us to obtain quantitative estimates in terms of the Wasserstein distance between those covariance measures.


Introduction
We consider the optimal mass transportation problem in R d with measurably parameterized marginals, for general cost functions and under conditions ensuring the existence of a unique optimal transport map.The aim of this note is to prove a joint measurability result for this map, with respect to the space variable and to the parameter.One of our motivations, developed at the end, is the construction of a strong coupling between martingale measures.That is, given a martingale measure, we shall construct in the same probability space a second one with specified covariance measure process.This will be done by pushing forward the given martingale measure through the optimal transport map between the covariance measures.To make this construction rigorous, we need the existence of a predictable version of this transport map, which will be a consequence of our main result.
We denote the space of Borel probability measures in R d by P(R d ), and by P p (R d ) the subspace of probability measures having finite p−order moment.Given π ∈ P(R 2d ), we write π < µ ν if µ, ν ∈ P(R d ) are respectively its first and second marginals.Such π is refereed to as a "transference plan" between µ and ν.Let c : R d → R + be a continuous function.The mapping is then lower semi continuous.The Monge-Kantorovich or optimal mass transportation problem with cost c and marginals µ, ν consists in finding inf It is well known that the infimum is attained as soon as it is finite, see [13], Ch.1.In this case, we denote by Π * c (µ, ν) the subset of P(R 2d ) of minimizers.If otherwise, I(π) = +∞ for all π < µ ν , then by convention we set Π * c (µ, ν) = ∅.We shall say that Assumption H(µ, ν, c) holds if a) µ does not give mass to sets with Hausdorff dimension smaller than or equal to d − 1.
b) there exists a unique optimal transference plan π ∈ Π * c (µ, ν), and it has the form for a µ(dx) − a.s.unique mapping T : Such T is called an optimal transport map between µ and ν for the cost function c. Hypothesis a) in H(µ, ν, c) is optimal both for existence and uniqueness of an optimal transport map, see Remark 9.5 in [14].We recall that if Π * c (µ, ν) = ∅, a) implies b) in the following situations (see Gangbo and McCann [4]): Our main result is Theorem 1.1.Let (E, Σ, m) be a σ−finite measurable space and consider a measurable function λ ∈ E → (µ λ , ν λ ) ∈ P(R d ) such that for m−almost every λ, H(µ λ , ν λ , c) holds, with optimal transport map T λ : R d → R d .Then, there exists a function (λ, x) → T (λ, x) which is measurable with respect to Σ ⊗ B(R d ) and such that m(dλ)−almost everywhere, In particular, T λ (x) is measurable with respect to the completion of Σ ⊗ B(R d ) with respect to m(dλ)µ λ (dx).
Theorem 1.1 generalizes Theorem 1.2 in [3], where we constructed a predictable version of a quadratic transport map, between a time-varying law and empirical samples of it.
To our knowledge, other measurability results on the mass transportation problem require a topological structure on the space of parameters, or concern transference plans but not transport maps (see e.g.[10], or Corollaries 5.22 and 5.23 in [14]).The proof of Theorem 1.1 is developed in the following section.We firstly establish a type of measurable dependence of the support of the optimizers on λ.From this result, we can define measurable partitions of E × R d induced by a dyadic partition of R d , and construct bi-measurable discrete approximations of T (λ, x).This approximation procedure was not needed in the simpler case studied in [3], where one of the marginals was an empirical measure (thus with finite support).
2 Proof of Theorem 1.1 Let us first state an intermediary result concerning measurability properties of minimizers in the general framework.Its formulation and proof require some notions of set-valued analysis, see e.g.Appendix A of [9].
Theorem 2.1.The function assigning to (µ, ν) the set of R 2d is measurable in the sense of set-valued mappings.That is, for any open set θ in R 2d , its inverse image Remark 2.2.In the case of a set-valued mapping taking closed-set values, measurability is equivalent to the fact that inverse images of closed sets are measurable (see [9]).
Proof.The idea of the proof is similar to the one of Theorem 1.3 in [3], where we considered the quadratic cost and the measurable structure induced by the Wasserstein topology.In the present case, the spaces P(R d ) and P(R 2d ) are endowed with the usual weak topology.We observe that Ψ writes as the adherence of a set-valued composition, where S and U are the set-valued mappings respectively defined by Measurability of Ψ is equivalent to U • S being measurable.The latter will be true as soon as S is measurable and U −1 (θ) is open for every open set θ (see [9]).The stability theorem for optimal transference plans of Schachermayer and Teichman (Theorem 3 in [11]) exactly states that inverse images through S of closed sets in P(R 2d ) are closed sets in (P(R d )) 2 .This, together with the fact that the mapping S takes closed-set values (by lower semi continuity of I(π)) imply that S is a measurable multi-application.
On the other hand, the inverse image by U of an open set θ of R 2d is It then follows by the Portmanteau Theorem that U −1 (θ) is an open set in P(R 2d ), and this concludes the proof.
Corollary 2.3.Let (E, Σ) be a measurable space, and λ ∈ E → (µ λ , ν λ ) ∈ (P 2 (R d )) 2 a measurable function.We consider the function Ψ defined by ( 1) and let F be a closed set of R d .Then, the set Proof.Without loss of generality, we assume that F is nonempty.Let us first show that for any open set θ of R 2d , the set is open.Indeed, for x ∈ G there exists y ∈ F and ε > 0 such that B(x, ε) × B(y, ε) ⊂ θ.
Let us now focus on the proof of Theorem 1.1 Proof of Theorem 1.1 Since any σ-finite measure is equivalent to a finite one, we can assume without loss of generality that m is finite.For a fixed k ≥ 1, we denote by (A n,k ) n∈Z d the partition of R d in dyadic half-open rectangles of size 2 −dk , that is and so B n,k is measurable thanks to Corollary 2.3.Denote now by a n,k ∈ A n,k the "center" of the set, and define a Σ ⊗ B(R d )−measurable function by For each λ ∈ E, let ν k λ be the discrete measure defined by pushing forward µ λ through T k , that is, Denote also by Ẽ ∈ Σ a measurable set with m( Ẽc ) = 0 and such that for all λ ∈ Ẽ, H(µ λ , ν λ , c) holds.By hypothesis, for each λ ∈ Ẽ we have that µ λ (dx) almost surely: where T λ has been defined in the statement of Theorem 1.1.This implies that by definition of T λ .We now check that (T k ) k∈N is a cauchy sequence in L 1 (E × R d , m(dλ)µ λ (dx)).Fix k ≤ k ′ , and for each n ∈ Z d denote by {A n ′ ,k ′ } n ′ the unique partition of A n,k in dyadic rectangles of size 2 −dk ′ .We then have that and the Cauchy property follows since m(E) < ∞.
Let us denote by T the limit in L 1 (E × R d , m(dλ)µ λ (dx)) of the sequence T k .Theorem 1.1 will be proved by verifying that for all λ in a set of Σ of full m-measure set, one has π λ (dx, dy) = µ λ (dx)δ T (λ,x) (dy).Hence, it is enough to check that for any semi-open rectangle C with dyadic extremes and all n ∈ Z d , k ∈ N. We have for λ ∈ Ẽ and any j ∈ N that We approximate 1 A n,k by a Lipschitz continuous function f λ,ε such that f λ,ε ∞ ≤ 1 and µ λ ({y : ).Hence, the second term ∆ ′ j on the r.h.s. of ( 4) is bounded by where L λ,ε is the Lipschitz constant of f λ,ε .Since T j (λ, x) − T (λ, x) µ λ (dx) converges in L 1 (m(dλ)) to 0, there is a subsequence T j i and a set Ê ∈ Σ of full measure such that the convergence holds for all λ ∈ Ê.Consequently, for all λ ∈ Ē := Ẽ ∩ Ê we get that lim sup i→∞ and since the l.h.s.does not depend on ε, this means that lim i→∞ ∆ ′ j i = 0.The proof will be achieved be verifying that for fixed λ ∈ Ē, one has ∆ j = 0 for all large enough j.For such λ, fix a Borel set D λ of R d of full µ λ measure where (3) is everywhere true.Then, Remark now that for all j ≥ k, y ∈ A n,k ⇐⇒ m:a m,j ∈A n,k a m,j 1 A m,j (y) ∈ A n,k .Then, for all j ≥ k,

Application: strong coupling for orthogonal martingale measures
We now develop an application of Theorem (1.1).Let (Ω, F, F t , P) be a filtered probability space and consider M an adapted orthogonal martingale measure on R + × R d (in the sense of Walsh [15]).Assume that its covariance measure has the form q t (da)dk t , where q t (ω, da) is a predictable random probability measure on R d with finite second moment and k t a predictable increasing process.Let us also consider another predictable random probability measure qt (ω, da) on R d with finite second moment.We want to construct in the same probability space a second martingale measure with covariance measure qt (ω, da)dk t , in such a way that in some sense, the distance between the martingale measures is controlled by the Wasserstein distance between their covariance measures.Recall that this distance is defined for µ, ν ∈ P 2 (R d ) by This distance makes the set P 2 (R d ) a Polish space, and strengthens the weak topology with the convergence of second moments (see [8]).
Theorem 3.1.In the previous setting, assume moreover that P(dω)dk t (ω) a.e.q t has a density with respect to Lebesgue measure in R d .Then, there exists in (Ω, F, F t , P) a martingale measure M on R + ×R d with covariance measure qt (da)dk t , such that for all S > 0 and for every predictable function φ : Ω×R + ×R d → R that is Lipschitz continuous in the last variable with E S 0 φ 2 (s, a) (q s (da) + qs (da)) dk s < ∞, one has where L s (ω) is a measurable version of a Lipschitz constant of φ(s, ω, •) and W 2 2 is the quadratic Wasserstein distance in P 2 (R d ).
On can thus define a martingale measure M by the stochastic integrals for predictable simple functions ψ.Its covariance measure is by construction qt (da)dk t , and by Doob's inequality, the left hand side of ( 5) is less than L 2 s W 2 2 (q s , qs ) dk s , by definition of T and of W 2 2 .
i) The construction of strong couplings between orthogonal martingale measures arises classically in the literature, especially in cases where the martingale measure M is a compensated Poisson point measure or a space-time white noise, for which the covariance measures are deterministic (cf.Grigelionis [5], El Karoui-Lepeltier [1], Tanaka [12], El Karoui-Méléard [2], Méléard-Roelly [7], Guérin [6]).A classical approach is to use the Skorokhod representation theorem.This however prevents any hope to obtain quantitative estimates related to the associated covariance measures, what we have been able to do here thanks to the optimal transport maps.
ii) If the probability space and the martingale measure M are not fixed in advance, a coupling satisfying the estimate (5) can be constructed from and orthogonal martingale measure M (dt, da, da ′ ) on R + × R d × R d with covariance measure π t (da, da ′ )dk t , where π t is an optimal transference plan between q t and qt .Then, M (dt, da, R d ) and M (dt, R d , da ′ ) are indeed two orthogonal martingale measures with the required covariances and satisfying estimate (3.1).The question in this situation is however how to construct such M .
i) c(x, y) = c(|x − y|) with c : R + → R + strictly convex, superlinear and differentiable with locally Lipschitz gradient.ii) c(x, y) = c(|x − y|) with c strictly concave, and µ and ν are mutually singular.Condition b) also holds if iii) c(x, y) = c(|x−y|) with c strictly convex and superlinear, and moreover µ is absolutely continuous with respect to Lebesgue measure.When µ, ν ∈ P p (R d ), fundamental examples are the cost function c(x, y) = |x − y| p with p ≥ 2 for case i), p > 1 for case iii), and p ∈ (0, 1) for case ii).