An extension of martingale transport and stability in robust finance

While many questions in robust finance can be posed in the martingale optimal transport framework or its weak extension, others like the subreplication price of VIX futures, the robust pricing of American options or the construction of shadow couplings necessitate additional information to be incorporated into the optimization problem beyond that of the underlying asset. In the present paper, we take into account this extra information by introducing an additional parameter to the weak martingale optimal transport problem. We prove the stability of the resulting problem with respect to the risk neutral marginal distributions of the underlying asset, thus extending the results in \cite{BeJoMaPa21b}. A key step is the generalization of the main result in \cite{BJMP22} to include the extra parameter into the setting. This result establishes that any martingale coupling can be approximated by a sequence of martingale couplings with specified marginals, provided that the marginals of this sequence converge to those of the original coupling. Finally, we deduce stability of the three previously mentioned motivating examples.


Introduction
In mathematical finance, the evolution of an asset price on a financial market is modeled by an adapted stochastic process (X t ) on a filtered probability spaces (Ω, F , P, (F t )).To ensure the absence of arbitrage opportunities, risk-neutral measures (also known as equivalent martingale measures) Q are considered under which the asset price process (X t ) is a martingale, up to assuming zero interest rates.The reason why a transport type problem arises in robust finance is because the marginals of (X t ) can be derived from market information based on the celebrated observation of Breeden-Litzenberger [11].According to this observation, the prices of traded vanilla options determine the marginals (µ t ) of (X t ) at their respective maturity times under the risk-neutral measure Q.Instead of considering one specific financial model, a robust approach is to consider all martingale measures that are compatible with this observation, that is, all filtered probability spaces (Ω, F , Q, (F t )) and stochastic processes (X t ) such that X is a (Q, (F t ))-martingale and X t ∼ µ t at all maturity times t. (1.1) Then the robust price bounds for an option with payoff Φ are obtained by solving a transport type problem [6,14] where the optimization takes place over the set of all risk-neutral measures that are compatible with the observed prices of vanilla options.That are martingale measures Q under which (X t ) has the correct marginal distributions, i.e., inf / sup {E Q [Φ] : (Ω, F , Q, (F t ), (X t )) satisfying (1.1)} . (1.2) However, as we can only observe the prices of a finite number of derivatives (up to a bid ask spread), the marginals (µ t ) are merely approximately known.Therefore, it is crucial to establish the stability of the transport type problem (1.2) with respect to the marginals.This article is concerned with the one time period setting, that is t ∈ {1, 2}.Then, when Φ is written on the underlying asset X, (1.2) boils down to a martingale optimal transport (MOT) problem inf / sup π∈ΠM (µ1,µ2) Φ(x, y) π(dx, dy), (1.3) where Π M (µ 1 , µ 2 ) denotes the set of martingale couplings with marginals µ 1 and µ 2 , i.e., the set of laws of 1-time step martingales (X 1 , X 2 ) with X t ∼ µ t .Continuity of the value of (1.3) w.r.t. the marginal input, which is called stability, has been proved in [4,25].Weak martingale optimal transport (WMOT) is a nonlinear generalization of MOT analogous to weak optimal transport, which is a nonlinear generalization of classical optimal transport proposed by Gozlan, Roberto, Samson and Tetali [15], and was considered in [4,8].In WMOT one allows for more general payoffs Φ which may depend on the conditional law of X 2 given X 1 in addition to X itself, and the corresponding WMOT problem reads as inf / sup π∈ΠM (µ1,µ2) Φ (x, π x ) µ 1 (dx), (1.4) where π x comes from the desintegration π(dx, dy) = µ 1 (dx)π x (dy).Stability of WMOT has been studied in [8] and was therein used to establish stability of the superreplication price of VIX futures and the stretched Brownian motion.
Even though many problems in robust finance are covered by WMOT, some important examples require that information is included into the optimization problem beyond that of the underlying asset.Accordingly these problems can not be properly treated in the WMOT frameworks.For us, guiding examples of such problems are the subreplication price of VIX futures, the robust pricing of American options and the construction of shadow couplings.Through augmenting WMOT by an additional parameter, we demonstrate how this extra information can be taken into account, prove stability of the resulting problem, and consequently deduce stability of the three guiding examples.A key step is the generalization of the main result in [7] to our current setting.This result states that any martingale coupling can be approximated by a sequence of martingale couplings with specified marginals, provided that the marginals of this sequence converge to those of the original coupling.As a side product of our approach, we establish the very same result on the level of stochastic processes with general filtrations (c.f.[5]) any 1-step martingale on some filtered probability space can be approximated w.r.t. the adapted Wasserstein distance by martingales on (perhaps different) filtered probability spaces, provided that the marginals of this sequence converge to those of the original martingale.

Notation
Let (X , d X ) and (Y, d Y ) be Polish metric spaces and p ≥ 1 We equip the product X × Y with the product metric d X ×Y ((x, y), (x, ỹ)) := (d X (x, x) p + d Y (y, ỹ) p ) 1/p which turns X × Y into a Polish metric space.The set of Borel probability measures on X is denoted by P(X ).For µ ∈ P(X ) and ν ∈ P(Y), we write Π(µ, ν) for the set of all probability measures on X × Y with marginals µ and ν.We denote by P p (X ) the subset of P(X ) that finitely integrates x → d p X (x, x 0 ) for some (thus any) x 0 ∈ X and endow P p (X ) with the p-Wasserstein distance W p so that (P p (X ), W p ) is a Polish metric space where, for µ, ν ∈ P p (X ), W p (µ, ν) := inf π∈Π(µ,ν) d X (x, y) p π(dx, dy) The set of continuous and bounded functions on X is denoted by C b (X ) and we use the shorthand notation µ(f ) to write the integral of a µ-integrable function f : X → R ∪ {±∞} w.r.t. a Borel measure µ on X .Given a measurable map f : X → Y, we denote by f # µ the push-forward measure of µ under f .For Polish spaces X 1 , X 2 , X 3 and π ∈ P(X 1 × X 2 × X 3 ) and a non-empty subset I of {1, 2, 3}, proj I π denotes the image of π by the projection to the coordinates in I, for example, proj 1 π is the X 1 -marginal of π.Further, we write π x1,x2 for the disintegration of π(dx 1 , dx 2 , dx 3 ) = proj 1,2 π(dx 1 , dx 2 )π x1,x2 (dx 3 ).Frequently, we use the injection (c.f.[3, Section 2]) Unless stated otherwise, R is equipped with the Euclidean distance and Leb denotes the Lebesgue measure on [0, 1].Two measures µ, ν ∈ P 1 (R) are said to be in the convex order and we write µ ≤ cx ν, if We write mean : P 1 (R) → R for mean(ρ) = y ρ(dy) and denote by

Organization of the paper
Section 2 presents the main results of this paper.First, we introduce in Subsection 2.1 the setup with the additional parameter and state in Theorem 2.1 and Theorem 2.2 the corresponding results related to stability.Furthermore, we present in Subsection 2.3 consequences of these results in the filtered process setting, namely Corollary 2.7.Subsequently, we explain and state stability of the three guiding examples, that are, subreplication of VIX futures (Subsection 2.2), robust pricing of American options (Subsection 2.4), and shadow couplings (Subsection 2.5).Section 3 is concerned with the proofs.

An extension of martingale transport
We introduce now a framework that is sufficiently general to deal with the question of stability of our guiding examples.From now on, let (U, d U ) be a Polish metric space that models an extra information parameter u ∈ U. Given μ ∈ P 1 (R × U) and ν ∈ P 1 (R) with proj 1 μ ≤ cx ν, we denote by Π M (μ, ν) the set of couplings π ∈ Π(μ, ν) such that mean(π x,u ) = x μ(dx, du)-a.e.Central to establishing the upper (resp.lower) semicontinuity property in our stability results for minimization (resp.maximization) problems is Theorem 3.5 , which is a reinforced version of the result below: In view of the counter-example by Brückerhoff and Juillet [12], this result does not generalize to higher dimensions i.e. when R is replaced by R d with d ≥ 2. This generalization of the main result of [7] to the present framework is also key to establish the stability w.r.t. the marginals of the following variant of WMOT: As usual, it is necessary to impose regularity on the cost C in order to have a continuous dependence of the optimal value of (2.1) w.r.t. the marginals.Thus, we will suppose the following continuity assumption on the cost function: Theorem 2.2.Let C satisfy Assumption A and C(x, u, •) be convex for all (x, u) ∈ R × U. Then the value function V C is attained and continuous on {(μ, ν) : proj 1 μ ≤ cx ν} ⊆ P p (R × U) × P p (R).Furthermore, when (μ k , ν k ) k∈N , proj 1 μk ≤ cx ν k , converges to (μ, ν), we have: ) is optimal for (2.1), so are accumulation points of (π k ) k∈N ; (ii) if additionally C(x, u, •) is strictly convex, then optimizers to (2.1) are unique.Furthermore, (π k ) k∈N and (J(π k )) k∈N weakly converge to the optimizer of (2.1) with marginals (μ, ν) and its image under J, respectively.

VIX futures
The VIX is the implied volatility of the 30-day variance swap on the S&P 500.According to Guyon, Menegaux and Nutz [16], the subreplication price at time 0 for the VIX future expiring at T 1 is given by where µ and ν denote the risk neutral distributions of the S&P 500 at dates T 1 and T 2 equal to T 1 plus 30 days both inferred from the market prices of liquid options.Moreover, the supremum is taken over all (φ, ψ) ∈ L 1 (µ) × L 1 (ν) and measurable maps ∆ S , ∆ L such that, for all (x, u, y) with ℓ x (y) := 2 T2−T1 ln(x/y).Up to assuming zero interest rates, the S&P 500 is a martingale under the risk neutral measure so that both, µ and ν, have finite first moments and µ is smaller than ν in the convex order.To state the dual problem, we define the set Π VIX (µ, ν) of admissible martingale couplings as with Id the identity function on R. Note that each π ∈ Π VIX (µ, ν) satisfies π ∈ Π M (proj 1,2 π, proj 3 π) and we have, by concavity of the logarithm function and Jensen's inequality, for proj 1,2 π-a.e.(x, u) that π x,u (ℓ x ) ≥ 0. Given probability measures µ, ν on (0, ∞) that are in the convex order and finitely integrate | ln(x)| + |x|, the dual problem D sub consists of (2.5) According to [16,Theorem 4.1], the values of P sub (µ, ν) and D sub (µ, ν) coincide.In the present paper, we are going to establish the following stability result with respect to the risk-neutral marginal distributions µ and ν of the S&P 500 at dates T 1 and T 2 .
The analogous stability result for the VIX future superreplication price is stated in [8,Theorem 1.3] and relies on the reduction of its dual formulation to the value function of a WMOT problem, see [16,Proposition 4.10].Such a reduction step is, in general, not possible for the dual formulation of the subreplication price and we remark that with the approach in this paper, one can recover [8,Theorem 1.3] without recasting the problem as a WMOT problem.

Filtered processes
As explained in the introduction, in the robust approach it is natural to consider all martingales that are compatible with market observations.For this reason, we follow the approach in [5], and call in our setting a 5-tuple X = Ω, F , P, (F t ) 2 t=1 , X = (X t ) 2 t=1 , consisting of a filtered probability space (Ω, F , P, (F t ) 2 t=1 ) and an (F t )-adapted process X, a filtered process.We say that a filtered process X is a martingale if X is a (F t )-martingale under P. When F 1 is larger than the σ-field generated by X 1 , the conditional distributions law(X 2 |F 1 ) and law(X 2 |X 1 ) may differ and then law(X 2 |F 1 ) is not determined by the law of X.For µ, ν ∈ P p (R) with µ ≤ cx ν, we write M(µ, ν) for the set of all martingales X with X 1 ∼ µ and X 2 ∼ ν.
In the current setting, we derive the following analogue to Theorem 2.1.
Then, every P ∈ Λ M (µ, ν) is the W p -limit of a sequence (P k ) k∈N with P k ∈ Λ M (µ k , ν k ).
Remark 2.5.The adapted Wasserstein distance between two filtered processes X and Y is, by [5, Theorem 3.10], given by Therefore, we may rephrase Corollary 2.4 using AW p , and obtain under the same assumptions that every process X ∈ M(µ, ν) is the AW p -limit of a sequence of processes (X k ) k∈N with X k ∈ M(µ k , ν k ).
Similar to Theorem 2.2 we get stability of (2.7).Proposition 2.6.Let C : R × P p (R) → R be continuous and assume that there is a constant K > 0 such that, for all (x, ρ) ∈ R × P p (R), Then the value VC is attained and continuous on {(µ, ν) ) is a sequence of optimizers of (2.7), then so are its accumulation points.
As in Remark 2.5, it is possible to phrase Proposition 2.6 in the language of filtered processes.Since the map R × P p (R) ∋ (x, ρ) → δ x ⊗ ρ ⊗ δ ρ ∈ P p (R × R × P p (R)) is continuous, adequate continuity and growth assumptions on Φ will imply that C(x, ρ) := δ x ⊗ ρ ⊗ δ ρ (Φ) satisfies the assumptions of Proposition 2.6.Hence, we can deduce the following stability result for (2.6).
Corollary 2.7.Let Φ : R × R × P p (R) → R be continuous and assume that there is a constant K > 0 such that, for all Then the value V Φ is attained and continuous on {(µ, ν) ∈ P p (R) × P p (R) : µ ≤ cx ν}.

American options
The robust pricing problem of American options as considered by Hobson and Norgilas [17], can be cast in the setting of Subsection 2.3.Given a filtered process X, the filtration (F t ) models the information that is available to the buyer, who may exercise at only two possible dates, t ∈ {1, 2}.For t ∈ {1, 2}, let Φ t : R t → R be a path-dependent payoff that she receives when exercising at time t.The model-independent price of this American option is given by Am(µ, ν) = sup X∈M(µ,ν) price(Φ; X). (2.8) As the buyer can exercise the option at any (stopping) time, the price crucially depends on the information that is available to the buyer and we have that the price of Φ is given by price(Φ; X) := sup τ (Ft)-stopping time (2.9) In the case of a Put, that is (Φ Hobson and Norgilas [17] relate the above suprema to the left-curtain martingale coupling [9] when µ does not weight points.By the Snell-envelope theorem, we have that price(Φ; which allows us to apply here Proposition 2.6 with C(x, ρ) := max(Φ 1 (x), Φ 2 (x, y) ρ(dy)), and deduce the following stability result: Corollary 2.8.Let Φ 1 and Φ 2 be continuous and sup (x,y)∈R 2 Φ1(x)

Topological refinements
In order to prove Proposition 2.3, we introduce refinements of the weak topology as detailed below, which we use to establish stronger versions of the results given in the introduction.For the rest of the paper, let X and Y be (non-empty) Polish subsets of R and consider two growth functions f : X × U → [1, +∞) and g : Y → [1, +∞) that are both continuous and lim inf We define the sets Similarly, we define Again, these spaces are endowed with the topology induced by Note that when X = R = Y and f (x, u) = 1 + |x| p + d p U (u 0 , u) for some u 0 ∈ U and g(y) = 1 + |y| p , we have P f (X × U) = P p (X × Y), P g (Y) = P p (Y), and the topologies on the above introduced spaces coincide with the corresponding p-Wasserstein topologies.Moreover, when d U is bounded, the growth condition (3.1) provides that these topologies are finer than the corresponding 1-Wasserstein topology.The reader may ignore these refinements of the wea k topology and may accordingly substitute in every statement these refinements with a p-Wasserstein topology.
Next, we define the injection and observe that In our specific setting we treat the X -and U-coordinates similarly as we interpret the X -coordinate as the spatial state (at time 1) and the U-coordinate as the information state (at time 1), whereas we think of the Y-coordinate as the state at time 2. For this reason, we say a sequence (π k ) k∈N in P(X × U × Y) converges in the adapted weak topology to π if The associated adapted p-Wasserstein distance of π 1 and π 2 , where π 1 , π 2 ∈ P p (X × U × Y), is given by where W p is the p-Wasserstein distance on P p (X ×U ×P p (Y)).The following reformulation of [13,Lemma 2.7] proves very useful to check convergence in the adapted Wasserstein topology.Lemma 3.1.Let (V, d V ) and (Z, d Z ) be Polish metric spaces, µ ∈ P p (V) and ϕ : V → Z by a measurable function such that ϕ # µ ∈ P p (Z).
).For more details on the adapted weak topologies and the adapted Wasserstein distance, we refer to [2,5].

Convergence of subprobability measures
Occasionally it will be advantageous to work with subprobability measures.Therefore, we denote by M p (X ) the set of finite non-negative Borel measures on X that have finite p-th moments and by M * p (X ) the subset of measures with positive mass.We say that a sequence (ρ k ) k∈N converges in M p (X ) to ρ if one of the following equivalent conditions holds: (a) (ρ k ) k∈N converges weakly to ρ and, for some have equal mass, we can consider their p-Wasserstein distance given by and similarly define the p-adapted Wasserstein distance AW p between measures π, , and (ρ k ) k∈N be a sequence in M * p (X ) with lim k→∞ ρ k (X ) = ρ(X ).Then the following are equivalent: Proof.Since lim k→∞ ρ k (X ) = ρ(X ), we have in either case that (ρ k ) k∈N and the normalized sequence (ρ k /ρ k (X )) k∈N are weakly convergent with limit ρ and ρ/ρ(X ), respectively.For some x 0 ∈ X , we then have Thus, the equivalence of (i) and (ii) follows from [24, Definition 6.8].
Lemma 3.3.Let p ≥ 1 and X be a Polish space.Let (ρ k ) k∈N be a convergent sequence in M p (X ) and (q k ) k∈N be a weakly convergent sequence with q k ≤ ρ k for every k ∈ N.Then, (q k ) k∈N converges in M p (X ).
Proof.Write ρ and q for the weak limits of (ρ k ) k∈N and (q k ) k∈N respectively.Consider the sequence qk := ρ k − q k ∈ M p (X ), k ∈ N, which is also weakly convergent with limit q := ρ − q.By Portmanteau's theorem we have Hence,

Approximation of extended martingale couplings: proof of Theorem 2.1
Before stating and proving a strengthened version of Theorem 2.1, let us deduce stability of the set of martingale couplings with respect to the marginals.The Hausdorff distance between two closed subsets The corresponding statement for couplings without the martingale constraint is straightforward to see as in this case one even has ) is relatively compact as consequence of Prokhorov's theorem.On the one hand, any sequence (π k ) k∈N with π k ∈ Π M (μ k , ν k ) admits a weakly convergent subsequence (π kj ) j∈N with limit π ∈ Π M (μ, ν).Therefore, On the other hand, the map π → W p (π, Π M (μ k , ν k )) is W p -continuous.Thus, by compactness of the set of martingale couplings there is for every Again by compactness, any subsequence of (π k ) k∈N admits a further subsequence converging weakly to some limit in Π M (μ, ν).For any of these accumulation points there is an approximative sequence provided by Theorem 2.1.Consequently, We will prove the following strengthened version of Theorem 2.1 which takes into account general integrability conditions over Polish subsets of R and is, in fact, an extension of the main result in [7].For µ ∈ P(X ) and ν ∈ P g (Y), µ ≤ cx ν means that the respective extensions µ(• ∩ X ) and ν(• ∩ Y) of µ and ν to the Borel sigma-field on with limit (μ, ν).Then, every coupling π ∈ Π M (μ, ν) is the limit in the adapted weak topology of a sequence The proof of Theorem 3.5 relies on the next three auxiliary results, that are Lemma 3.6, Lemma 3.7, and Proposition 3.8.
In order to show Theorem 3.5, it turns out to be beneficial to first demonstrate that a family of couplings with a simpler structure is already dense.We say that a coupling π ∈ Π M (μ, ν) is simple if there is J ∈ N, a measurable partition (U j ) J j=1 of U into proj 2 μ-continuity sets and, for j ∈ {1, . . ., J}, a martingale kernel Put differently, one may say π is simple if there exist (classical) martingale couplings π j ∈ Π M (µ, ν j ), j ∈ {1, . . ., J}, and a measurable partition (U j ) J j=1 of U in proj 2 μ-continuity sets such that π(dx, du, dy) = J j=1 π j (dx, dy)μ x (du ∩ U j ).
The next lemma establishes that these simple couplings are already dense in Π M (μ, ν).
Proof.We denote by λ = proj 2 μ ∈ P 1 (U).Let u 0 ∈ U and ε > 0. We claim that there is a finite partition for j ∈ {1, . . ., J − 1}, and To this end, note that since the map u By inner regularity of λ there exists a compact subset Next, we choose for each u ∈ K a radius r u ∈ (0, ε 4 ] such that the boundary of the ball B ru (u) := {û ∈ U : d U (u, û) < r u } has zero measure under λ.The family (B ru (u)) u∈K is an open cover of the compact set K, which permits us to extract from this family a finite subcover of K denoted by (B j ) I j=1 , I ∈ N. Let J := I + 1, U J := J j=1 B c j ⊂ K c , and set recursively, for j ∈ {1, . . ., J − 1}.By this procedure we have constructed a partition (U j ) J j=1 of U into measurable sets.Moreover, as for each i ∈ {1, . . ., J} the boundary of U i is contained in the union of the boundaries of the balls (B j ) J j=1 , it must have zero λ-measure.Finally, for each j ∈ {1, . . ., J − 1} we get and compute We have shown the claim (3.7).
Finally, we also require the following approximation result that concerns the marginals.
Proposition 3.8.Let (µ k , ν k ) k∈N , µ k ≤ cx ν k , be a sequence in P 1 (R) × P 1 (R) with limit (µ, ν) being irreducible.For 1 ≤ j ≤ J ∈ N, let (µ k j ) k∈N be a convergent sequence in M 1 (R) with limit µ j and N j=1 µ k j = µ k .Let (ν j ) J j=1 , µ j ≤ cx ν j , be a family in M * 1 (R) such that ν = J j=1 ν j .Then, for 1 ≤ j ≤ J, there exist a convergent sequence (ν k j ) k∈N in M 1 (R) with limit ν j such that The proof of Proposition 3.8 is rather technical and therefore postponed to Subsection 3.7.On closer inspection of the statement, this is not completely surprising: in the setting of Proposition 3.8, let (µ j ) J j=1 and (µ k j ) J j=1 be families of measures with µ j ({x j }) = µ j (R) and µ k j ({x k j }) = µ k j (R) for some x j , x k j ∈ R so that the points (x j ) J j=1 are distinct.For π ∈ Π M (µ, ν), we define ν j := π xj .Invoking Proposition 3.8 we obtain (ν k j ) J j=1 and set Since µ k j is concentrated on a single point and µ k j ≤ cx ν k j , π k defines a martingale coupling in Π M (µ k , ν k ) and, as ν k j → ν j and µ k j → µ j in M 1 (R), (π k ) k∈N converges in AW 1 to π.Hence, we recover in this particular setting the main result of [7], which states that, as long as Proof of Theorem 3.5.By following the reasoning outlined in [7, Lemma 5.2], incorporating the additional coordinate and replacing [7, Proposition 2.5] by Lemma 3.6, one can confirm that it suffices to establish the conclusion when (μ, ν) is such that (proj 1 μ, ν) is irreducible.As the argument runs almost verbatim to the proof of [7, Lemma 5.2], we omit the details and assume from now on that (proj 1 μ, ν) is irreducible.
Let us suppose that d U denotes some bounded complete metric compatible with the topology on U and check that we may suppose w.l.o.g. that x,u ) # πk ) k converges to J(π) = (x, u, πx,u ) # π in P 1 (R × U × P 1 (R)).Since X , U and Y are Polish, the Borel sigma-fields satisfy . By [8,Lemma A.7], the sequence (J(π k )) k is relatively compact in P f ⊕ĝ (X × U × P g (Y)).Let (J(π kj )) j denote some subsequence converging to Q. Since the injection i : ) and i # J(π) = J(π), we have for any continuous and bounded function ϕ on R × U × P(R), The equality between the left-most and right-most terms remains valid when ϕ is measurable and bounded.
Therefore, we assume from now on that X = R = Y, f (x, u) = 1 + |x| + d U (u, u 0 ) and g(y) = 1 + |y|.Moreover, by using Lemma 3.7 we may assume that π admits the representation (3.6).Let (U j ) J j=1 be the associated finite measurable partition of U. Without loss of generality, e.g. by replacing one element of the partition U k such that μ(R × U k ) > 0 with the union of U k with all elements U j that satisfy μ(R × U j ) = 0 and removing the latter, we can assume that min 1≤j≤J μ(R × U j ) > 0. For j ∈ {1, . . ., J} and k ∈ N, we define μj := 1 R×Uj μ, μk j := 1 R×Uj μk , µ j := proj 1 μj and µ k j := proj 1 μk j .As (U j ) J j=1 is comprised of continuity sets for the first marginal of μ, the weak convergence of (μ k ) k∈N to μ implies that (μ k j ) k∈N converges weakly to μj and, due to the continuity of the first coordinate mapping, (µ k j ) k∈N converges weakly to µ j for each j ∈ {1, • • • , J}.All the requirements of Proposition 3.8 are satisfied, allowing us to identify, for each j ∈ {1, . . ., J}, a sequence of subprobability measures (ν k j ) k∈N such that From now on we will assume that k is large enough so that min 1≤j≤J µ k j (R) > 0. Weak convergence of the original sequences yields, for each j ∈ {1, • • • , J}, that the normalized sequence (μ k j /µ k j (R)) k∈N (resp.(ν k j /µ k j (R)) k∈N ) converges weakly to μj /µ j (R) (resp.ν j /µ j (R)) as k → ∞.As (μ k ) k∈N and (ν k ) k∈N are W 1 -convergent sequences, it then follows easily from Lemmas 3.2 and 3.3 that the normalized sequences converge in W 1 .Thus, we can apply [7,Theorem 2.6] and obtain an AW 1 -convergent sequence (γ k j ) k∈N of martingale couplings with limit γ j where and .

Proofs of Corollary 2.4 and Propositions 2.3 and 2.6
We are first going to prove the following stronger variants of Corollary 2.4 and Proposition 2.6 before deducing Proposition 2.3.Let f : X → [1, +∞) be a continuous growth function such that lim inf The topological space P f (X ) is defined like P g (Y) with X and f replacing Y and g.The topological space P f ⊕ĝ (X × P g (Y)) is defined analogously to P f⊕ĝ (X × U × P g (Y)) but without the u coordinate.
Theorem 3.11.Let C satisfy Assumption B and C(x, u, •) be convex.Then the value function V C is attained and continuous on {(μ, ν) : proj , so are its accumulation points; (ii) if C(x, u, •) is strictly convex, then optimizers of (2.7) are unique and (π k ) k∈N converges to the optimizer of (2.1) with marginals (μ, ν) in the adapted weak topology.
To show (ii), we assume the opposite, that is, that (π k ) k∈N admits a subsequence which does not have π ⋆ as an accumulation point w.r.t. the adapted weak topology.By [8,Lemma A.7], this particular subsequence admits a subsequence (π kj ) j∈N such that (J(π kj )) j∈N converges in P f⊕ĝ (X × U × P g (Y)) to P .We define π ∈ Π M (μ, ν) by π = μ × πx,u with πx,u = ρ(dy) P x,u (dρ).As C(x, u, •) is convex and continuous, we have by Jensen's inequality In particular, π is an optimizer of V C (μ, ν) and, by strict convexity of C(x, u, •), we have J(π) = P and uniqueness of optimizers.Thus, π = π ⋆ , and we also get J(π ⋆ ) = P .Hence, (π kj ) j∈N converges in the adapted weak topology to π ⋆ , which is a contradiction and completes the proof.

Stability of the shadow couplings: proof of Proposition 2.9
Let us first state a consequence of Proposition 2.9 concerning the shadow couplings.
In view of Sklar's theorem, it is natural to parametrize the dependence structure between µ and the Lebesgue measure on [0, 1] in the lift μ ∈ Π(µ, Leb) of µ by copulas i.e. probability measures on [0, 1] × [0, 1] with both marginals equal to the Lebesgue measure.We call shadow coupling between µ and ν with copula χ the shadow coupling between µ and ν with source equal to the image μχ of χ by [0 denotes the quantile function of µ.
Corollary 3.12.The shadow coupling with copula χ is continuous on the domain {(µ, ν) : µ ≤ cx ν} ⊆ P p (R) × P p (R) and with range (P p (R × R), W p ) and even continuous in AW p at each couple (µ, ν) such that µ does not weight points.
The proof that the selector SC of the lifted shadow coupling is continuous when the codomain P p (R × [0, 1] × R) is endowed with the adapted Wasserstein distance AW p relies on the fact that, by (2.12), the selector SC takes values in the following extremal set of extended martingale couplings The set Π ext M,p is extremal in the following sense: when π ∈ Π ext M,p and P ∈ P p (R × U × P p (R)) with I(P ) = π, where I(P ) is the unique measure that satisfies f (x, u, y) I(P )(dx, du, dy) = f (x, u, y) ρ(dy) P (dx, du, dρ), and mean(ρ) = x P -a.s., then we already have P = J(π).Proceeding from this observation, the next lemma shows that on Π ext M,p the p-Wasserstein topology coincides with the p-adapted Wasserstein topology, which we in turn use to prove Proposition 2.9. with inverse I. Using [3, Lemma 2.3], we find that the sequence (J(π k )) k∈N is W p -relatively compact in P p (R × U × P p (R)).Therefore, there is a subsequence (π kj ) j∈N such that J(π kj ) → P .Since π kj → I(P ) = π ∈ Π ext M and mean(ρ) = x P (dx, du, dρ)-a.e., we get by (3.16) that P ∈ J(Π ext M,p ) which yields by bijectivity of J| Π ext M,p that P = J(π).Hence, J(π kj ) → J(π) in W p which means that π kj → π in AW p .Since any subsequence of (π k ) k∈N admits by above reasoning an AW p -convergent subsequence with limit π, we conclude that π k → π in AW p .
The proof of Proposition 2.9 also relies on the following two lemmas, the proof of which are postponed to the end of the current section.Lemma 3.14.Let V, Z be Polish spaces, (θ k ) k∈N be a sequence in P(V) that converges in total variation to θ, and let ϕ k : V → Z k ∈ N, and ϕ : V → Z be measurable functions.Then Lemma 3.15.Let x, y, z ∈ R with y < x < z, and ((y k , z k )) k∈N be a (−∞, x] × [x, +∞)-valued sequence such that for each k, either y k < x < z k or y k = x = z k .Then we have Proof of Proposition 2.9.As optimizers of V SC are unique, we immediately obtain from Theorem 3.11 applied with C(x, u, ρ) = R (1 − u) 1 + y 2 ρ(dy) continuity of when the domain is endowed with the product of the corresponding Wasserstein p-topologies.Since SC is a continuous function taking values in Π ext M , Lemma 3.13 ensures that it is still continuous when the codomain is endowed with the stronger AW p -distance.Therefore when By Proposition 2.9 we have that where X : R × [0, 1] ∋ (x, u) → x ∈ R. Applying Lemma 3.14 in the setting There exists a subsequence such that this convergence holds μ-a.s.Hence, we can invoke Lemma 3.15 and derive the assertion in the second statement of the proposition for this particular subsequence.By the above reasoning any subsequence admits a subsubsequence which fulfills the conclusion of the second statement of the proposition, which readily implies the statement.
Proof of Corollary 3.12.For the continuity in W p , it is enough to combine Proposition 2.9 with To prove the reinforced continuity in AW p , we consider a sequence ((µ k , ν k ) k ) in P p (R) × P p (R) with µ k ≤ cx ν k converging to (µ, ν) where µ does not weight points.For notational simplicity, we denote SC k and SC respectively in place of SC(μ k χ , ν k ) and SC(μ χ , ν).By the reinforcement of Proposition 2.9, AW p (SC k , SC) → 0. Let η k ∈ Π(μ k χ , μχ ) be optimal for AW p (SC k , SC).We have The second term in the right-hand side goes to 0 according to Lemma 3.1 since μχ is the image of χ by ,u χ(dv, du) → 0. Let π k (resp.π) denote the shadow coupling with copula χ between µ k and ν k (resp.µ and ν) and for (x, The image of the Lebesgue measure on [0, 1] × [0, 1] by ϑ k is the Lebesgue measure on [0, 1] and for each v ∈ (0, 1), , dw a.e.. Hence dv a.e., Since µ does not weight points, F −1 µ is one-to-one and π , dv a.e..By the triangle inequality and Jensen's inequality (see for instance [8, Proposition A.9]), we have Using again that the image of the Lebesgue measure on [0, 1] × [0, 1] by ϑ k is the Lebesgue measure on [0, 1], we deduce that The sum of the first two terms in the right-hand side goes to 0 as n → ∞.Since, by the proof of [19, Proposition 4.2] (see the equation just above (4.12)where θ(F −1 µ (v), w) = v since F µ is continuous), dvdw a.e., ϑ k (v, w) → v, we have [0,1]×[0,1] |ϑ k (v, w) − v| p dvdw → 0 by Lebesgue's theorem so that the third term in the right-hand side also goes to 0 by Lemma 3.1 due to Eder [13].
Remark 3.16.Like in the proof of [19,Proposition 4.2], we could check that AW p (π k , π) still goes to 0 as n → ∞ when Proof of Lemma 3.14.As θ k → θ in total variation, we have that the total variation distance between (Id, ϕ k ) # θ k and (Id, ϕ k ) # θ vanishes as k → ∞.Thus, since ((Id, ϕ k ) # θ k ) k∈N converges to (Id, ϕ) # θ =: η in P(V × Z), the same holds for the sequence (η k ) k∈N where η k := (Id, ϕ k ) # θ.W.l.o.g.we assume that the metrics d X and d Y are both bounded, so that η k → η in W 1 and can pick couplings By the triangle inequality we have The first summand in (3.19) vanishes for k → ∞ as η k → η in W 1 , whereas the second summand vanishes as consequence of Lemma 3.1 due to Eder [13] since Hence u ρ m converges uniformly to u ρ as m → ∞, which implies that W 1 (ρ m , ρ) −→ m→+∞ 0.
Before jumping into the various steps of proving (3.23), we fix the following notation: Let a ∈ {−∞} ∪ R and b ∈ R ∪ {+∞} be the endpoints of the irreducible component I = (a, b) of (µ, ν).Further, let .
Up to modifying x → π j x on a µ-null set, we suppose w.l.o.g. that for all x ∈ (a, b), π j x is concentrated on [a, b] and mean(π j x ) = x.Finally, for m ∈ N, pick a m , b m ∈ I, a m < b m , with a m ց a, and b m ր b, so that µ j ([a m , b m ]) > 0 and µ j ({a m , b m }) = 0 for each j = 1, . . ., J.
Step 1: We claim that when m is sufficiently large, there exists νj ∈ M 1 (R) with To show (3.24) we define q m x as the unique probability measure supported on {a m , b m } with mean(q m x ) = x when x ∈ [a m , b m ], and δ x otherwise, i.e., Set π j,m (dx, dy) := µ j (dx) (π j x ∧ c q m x )(dy).The measure π j,m is a martingale coupling between µ j and its second marginal, which we denote by ν j,m and thus ν j,m ≤ c ν j .Thanks to Lemma 3.17 we have for every x ∈ (a, b) that W 1 (π j x , π j x ∧ c q m x ) → 0. Furthermore, by the triangle inequality and convexity of the absolute value we have x , δ 0 ), where the right-hand side is µ j -integrable.Hence, we get by dominated convergence Letting m be sufficiently large, (3.25) yields that νj : Step 2: Next we construct, for j ∈ {1, . . ., J}, sequences (ν With Lemma 3. Summarizing, we have J j=1 u νk j ≤ u ν k for k ≥ k(δ), which yields (3.30).

Lemma 3 . 13 .
The identity map Id on P p (R × U × R) is (W p , AW p )-continuous at any P ∈ Π ext M,p .In particular, the metric spaces (Π ext M,p , W p ) and (Π ext M,p , AW p ) are topologically equivalent.Proof.We follow a similar line of reasoning as used in [23,Lemma 7].As W p ≤ AW p , it suffices to show that, given a sequence(π k ) k∈N in P p (R × U × R) with mean(π k x,u ) = x π k -a.s. and π ∈ Π ext M,p , lim k→∞ W p (π k , π) = 0 =⇒ lim k→∞ AW p (π k , π) = 0.So, let (π k ) k∈N and π be as above and assume that π k → π in W p .Observe that J| Π ext M,p is bijective onto J(Π ext M,p ) = {P ∈ P p (R × U × P p (R)) : I(P ) ∈ Π ext M,p and mean(ρ) = x P (dx, du, dρ)-a.e.},(3.16) .26) Since µ j ({a m , b m }) = 0, we have for every h ∈ C b (R) that the discontinuities of h1 [a m ,b m ] are a µ j -null set, whence we get by Portmanteau's theorem