Weak monotone rearrangement on the line

Weak optimal transport has been recently introduced by Gozlan et al. The original motivation stems from the theory of geometric inequalities; further applications concern numerics of martingale optimal transport and stability in mathematical finance. In this note we provide a complete geometric characterization of the 'weak' version of the classical monotone rearrangement between measures on the real line, complementing earlier results of Alfonsi, Corbetta, and Jourdain.


Introduction
Recently, there has been a growing interest in weak transport problems as introduced by Gozlan et al [15]. While the original motivation mainly stems from applications to geometric inequalities (cf. the works of Marton [17,16] and Talagrand [21,22]), weak transport problems appear also in a number of further topics, including martingale optimal transport [2,4,10,6], the causal transport problem [7,1], and stability in math. finance [5].
We call the (µ-a.s. unique) map T in Theorem 1.2 the weak monotone rearrangement. A particular consequence of Theorem 1.2 is that the optimizer of (1.2) does not depend on the choice of the convex function θ. We find this fact non-trivial as well as remarkable and highlight that it is not new: different independent proofs were given by Gozlan et al [15], Alfonsi, Corbetta, Jourdain [2] and Shu [20]. Alfonsi, Corbetta and Jourdain [3,Example 2.4] notice that this does not pertain in higher dimensions.
The map T can be explicitly characterized in geometric terms using the notion of irreducibility introduced in [9]: Measures η, ν ∈ P 1 (R) are in convex order iff u η (y) := R |x − y|η(dx) ≤ R |x − y|ν(dx) =: u ν (y), (1.3) and, by continuity, the set U where this inequality is strict is open. Hence U = n I n , where (I n ) is an at most countable family of disjoint open intervals; these intervals I n are called irreducible with respect to (η, ν). Theorem 1.3. The weak monotone rearrangement T of µ, ν ∈ P 1 (R) is the unique admissible map which has slope 1 on each interval T −1 (I), where I is irreducible wrt (T (µ), ν). Theorem 1.3 represents a necessary and sufficient condition for the optimality of the measure T (µ * ) in (1.2). We note that the 'necessary' part was first obtained (using somewhat different phrasing) by Alfonsi, Corbetta, and Jourdain [2,Proposition 3.12]. We also refer the reader to the semi-explicit representation of T and T (µ * ) given in [2].   .7), respectively. Blue lines depict contractive parts of the map, purple lines depict areas with (non trivial) martingale transport.
1.2. Connection with martingale transport plans. Intuitively, the irreducible intervals of (η, ν) are the components where we need to 'expand' η in order to transform it into ν. In this sense Theorem 1.3 asserts that the mass of µ can either concentrate between µ and T (µ), or it can expanded between T (µ) and ν (see Figure 1).
To make this precise, we recall from [15] that (1.2) can be reformulated as where (π x ) x∈R d denotes a regular disintegration of the coupling π wrt its first marginal µ.
The set of optimizers of (1.4) is also straightforward to express in terms of T : Write Π M (η, ν) for the set of martingale couplings (or martingale transport plans), i.e. π ∈ Π(η, ν) which satisfy barycenter(π x ) = x, η-a.s. By Strassen's theorem Π M (η, ν) is nonempty iff η ≤ c ν. Using this notation, π ∈ Π(µ, ν) is optimal iff there exists a martingale coupling π M ∈ Π M (T (µ), ν) such that π is the concatenation of the transports described by T and π M : Any π M ∈ Π M (η, ν) can be decomposed based on the family of irreducible intervals (I n ) n : denoting F := (∪ n I n ) c by [9, Appendix A] we have Plainly, (1.6) asserts that any martingale transport plan can move mass only within the individual irreducible intervals, whereas particles x ∈ F have to stay put.

1.3.
A reverse problem. Alfonsi, Corbetta, and Jourdain [2] proved that the same value is obtained when reversing the order of transport and convex order relaxation in (1.2), i.e.
As in the previous section, (1.7) could be interpreted as a concatenation of a martingale transport with the weak monotone rearrangement.
1.4. An auxiliary result. We close this introductory section which an auxiliary result that will be important in the proofs of our results. Since it might be of independent interest we provide the d-dimensional version. We denote the topology induced by the ρ-Wasserstein distance on the space of probability measures on R d by W ρ .

C-Monotonicity implies geometric characterization
In this part we prove the following Theorem 2.1. Let µ, ν ∈ P 1 (R) and θ : R → R strictly convex. If V θ (µ, ν) yields a finite value, for any optimizer π ∈ Π(µ, ν) of (1.4), the map T (x) := R yπ x (dy), is µ-almost surely uniquely defined and is independent of the specific coupling π, it is admissible in the sense of Definition 1.1, and it has slope 1 on T −1 (I) if I is an irreducible interval for (T (µ), ν).

2.2)
Since π is a martingale coupling, and by (2 Write d + for the largest and d − for the smallest d such that (2.3) holds. Note then that and we conclude by contradicting (2.4) and such that for some β ∈ (0, 1) the functions are strictly decreasing and increasing, respectively.
Proof. Let α ∈ [0, 1] and define the inverse distribution functions by where F p and F q denote the cumulative distribution functions of p and q, respectively. Define two auxiliary measures Defining probability measures p α and q α by p α :=p α +q 1−α and q α := p + q − p α , yields (2.4) and continuity of α → (p α , q α ). Since p and q satisfy (2.1), we find constants and conclude that for β := 1 − α 3 the maps defined in (2.5) are strictly monotone.
An important tool in the proof of Theorem 2.1 is C-monotonicity, a concept which was introduced for the weak optimal transport problem in [6,13,8].
Definition 2.4 (C-monotonicity). A coupling π ∈ Π(µ, ν) is C-monotone if there exists a measurable set Γ ⊆ X with µ(Γ) = 1, such that for any finite number of points x 1 , . . . , x N in Γ and measures m 1 , . . . , Proof of Theorem 2.1. Let π * be optimal for the weak optimal transport problem (1.4). By the monotonicity principle [8, Theorem 5.2] π * is C-monotone, therefore, there exists a set Γ ⊆ R with µ(Γ) = 1 and such that for all x, y ∈ Γ, p 1 , As an immediate consequence, we find that the map T (x) = R yπ * x (dy) is µ-almost surely increasing. Letting x, y ∈ Γ, for any α ∈ [0, 1] we define Plugging p α 1 and p α 2 into (2.6) and computing the righthand-side derivative yields which is by strict convexity of θ equivalent to Figure 2. Sketch of usage of Lemma 2.2 and Lemma 2.3 to find contradiction to C-monotonicity of an C-optimal coupling π.
. Without loss of generality, we can assumeπ T (x) = π * x , µ-a.e. Let (I k ) k∈N be the intervals given by the decomposition of (T (µ), ν) into irreducible intervals. Assume that there is an interval I k so that on T −1 (I k ) the map T does not have µ-a.s. slope 1. Then Lemma 2.2 provides two pointsx,ỹ in T (Γ ∩ I k ) and two corresponding points x, y ∈ Γ ∩ I k such that and the overlapping condition (2.1) holds forπ T (x) = π * x ,π T (y) = π * y . Lemma 2.3 allows us to define measures p α and q α on R such that π * x + π * y =π T (x) +π T (y) = p α + q α . For a graphical depiction compare with Figure 2. Hence, , are strictly monotone, continuous maps on [β, 1]. Therefore, we find α ∈ (β, 1) with By strict convexity of θ we find which then yields a contradiction to C-monotonicity:

Sufficiency of the geometric characterization
Naturally the question arises whether any map T satisfying the properties in Theorem 2.1 must be optimal. The aim of this section is to establish this: Theorem 3.1. Let µ, ν ∈ P 1 (R). Then any coupling π ∈ Π(µ, ν) for which T (x) := R yπ x (dy) is admissible (in the sense of Definition 1.1) with slope 1 on each interval T −1 (I), where I is irreducible wrt (T (µ), ν), is optimal for (1.4), i.e., The proof is based on dual optimizers and their explicit representation. As long as T is strictly increasing, [19, Theorem 2.1] provides dual optimizers to V θ (µ, T (µ)). Investigating dual optimizers further, we are able to show here V θ (µ, T (µ)) = V θ (µ, ν). First, Lemma 3.2 helps us to carefully approximate the increasing map T with strictly increasing maps T ε . Lemma 3.2. Let T : R → R be an increasing map, with T − id decreasing. Then, for any ε > 0 there is a strictly increasing map T ε : R → R, with T ε − id decreasing, such that |T − T ε | ∞ ≤ ε, and T is affine with slope 1 on an interval I if and only if T ε is affine with slope 1 on I.
Proof. Since T is increasing we know that the pre-image of any point under T corresponds to an interval. Therefore, we can find at most countable many, disjoint intervals (I k ) k∈N of finite length, where T (I k ) is a singleton and T is strictly increasing on the complement, i.e., on k I c k . For any ε > 0, we define g ε : dy satisfies the desired properties. Let T : R → R be an increasing 1-Lipschitz function. Then T induces a unique decomposition of R into at most countably many maximal, closed, disjoint intervals (I k ) k and a (G δ -set) G such that for all k ∈ N the map T | I k is affine with slope 1 and T | G is properly contractive, i.e., for any two points x, y ∈ G we have |T (x) − T (y)| < |x − y|. Below we call the intervals (I k ) k irreducible wrt T . By monotone convergence we find 1 sup n V θ n (µ, ν) = V θ (µ, ν). Indeed, if π n optimizes V θ n (µ, ν) and assuming wlog that π n → π, then lim n θ n x − yπ n x (dy) µ(dx) ≥ lim n θ m x − yπ n x (dy) µ(dx) ≥ θ m x − yπ x (dy) µ(dx), by [8,Proposition 2.8]. Thus the claim follows by taking the supremum in m.
From this, it suffices to consider the case when θ is Lipschitz continuous. By Lemma 3.2 we find for any ε > 0 a strictly increasing map T ε , such that T ε − id is decreasing, the decompositions of T and T ε match, and |T ε − T | ∞ ≤ ε. Then [19, Theorem 2.1] provides a convex, Lipschitz continuous function f ε : which is even affine on the parts where T ε is affine.
In the following we will show that f ε is a dual optimizer of the coupling π ε defined as the push-forward measure of π by the function (x, y) → (x, y + S ε (x)).
In view of the structure of martingale couplings, see [9, Theorem A.4], we find that for µ-a.e. x ∈ T −1 ε (I ε k ) we have supp(π ε x ) ⊆ I ε k and π ε x = δ T (x)+S ε (x) µ-a.e. on F ε . Since the decompositions of T and (T (µ), ν) are complementary, we infer the same for the decompositions of the map T ε and the pair (T ε (µ), ν ε ). The next computation establishes duality of the pair (π ε , f ε ), where we use affinity of f ε on the irreducible components of (T ε (µ), ν ε ): . This easily proves that π ε is optimal for the optimal weak transport problem V θ (µ, ν ε ). Drawing the limit for ε 0, we observe . As θ is Lipschitz, we can apply stability Theorem 1.5 and obtain optimality of π.

Geometry of the weak monotone rearrangement
We can summarize Theorems 2.1 and 3.1 as follows: There exists an admissible map T with slope 1 on T −1 (I) whenever I is an irreducible interval wrt (T (µ), ν), such that π is optimal for (1.4) iff T (x) = yπ x (dy) (µ-a.s.). We now show that this map is the weak monotone rearrangement and is therefore the maximum in convex order of the set 2 M(µ, ν) := S : R → R : S is increasing and 1-Lipschitz, S (µ) ≤ c ν .
Heuristically speaking, if the maximum in convex order of the set M(µ, ν) is again given by an increasing, 1-Lipschitz map, then this map is as close as possible to a shifted identity. In turn, this is favourable when trying to find the minimum in convex order of {(id − S )(µ) : S ∈ M(µ, ν)} , which gives reason to why there should exist a single optimizer to (1.4) for all convex θ.
As preparation to establishing Theorem 1.3, we prove Lemma 4.1 and Lemma 4.2.
Lemma 4.1. Let µ ∈ P 1 (R), T, S : R → R be increasing maps with R T (x)µ(dx) = R S (x)µ(dx), then the maximum (wrt the convex order) of T (µ) and S (µ), which is uniquely determined by its potential functions, is again given by an increasing map. If in addition, the maps are L-Lipschitz with L > 0, then the maximum is also given by an L-Lipschitz map.
Proof. The maximum of T (µ) and S (µ) wrt. the convex order is uniquely determined by the maximum of its potential functions, i.e. u S (µ) ∨ u T (µ) = u S (µ)∨T (µ) . The right-hand side derivative of the potential function can be expressed by the cumulative distribution function, namely ∂ + u µ (x) = 2F µ (x) − 1. By continuity of the potential functions, we find a partition of R into at most countably many disjoint intervals (I k ) k∈N , where u T (µ) = u S (µ) on ∂I k , and restricted onto I k one of the following holds true: (4.1) By monotonicity, we can define T * onĨ k : Hence, T * is an increasing map, F T * (µ) = F S (µ)∨T (µ) and S (µ)∨T (µ) is given by the map T * . If T and S are in addition L-Lipschitz, it follows by construction that T * is L-Lipschitz.
Lemma 4.2. Let η 1 ≤ c η 2 and T 2 (η 2 ) ≤ c T 1 (η 1 ), where T 1 , T 2 are increasing and 1- In particular, if T and S are increasing 1-Lipschitz maps s.t. R T (x)µ(dx) = R S (x)µ(dx), and we denote by R the increasing 1-Lipschitz map with R(µ) = S (µ) ∨ T (µ), which exists by Lemma 4.1, then Proof. By approximation, it suffices to settle the case when η 1 , η 2 are uniform measures on n ∈ N atoms. Let x i 1 ≤ x i 2 ≤ · · · ≤ x i n denote the atoms of η i . Then the vector is ordered in an increasing way. What is more, the vector y i := (x i 1 − z i 1 , . . . , x i n − z i n ) is likewise ordered increasingly, since id − T i is an increasing map. By e.g. [14,Proposition 2.6] we know that ∀k ≤ n : ≤k x 2 ≤ ≤k x 1 , ≤k z 1 ≤ ≤k z 2 . But then also ≤k x 2 − z 2 ≤ ≤k x 1 − z 1 , so again by [14, Proposition 2.6] we conclude (id − T 1 )(η 1 ) ≤ c (id − T 2 )(η 2 ). The second statement easily follows from the first one.
Proof of Theorem 1.3. Existence of an admissible map T which has slope 1 on each interval T −1 (I), where I is irreducible wrt (T (µ), ν), was already shown in Theorem 2.1. Therefore, it remains to show that the map is maximal. Denote by T the map given by Theorem 2.1 associated with an optimizer to (1.4) and some strictly convex θ : R → R. Let S be an arbitrary map in M(µ, ν). Then Lemma 4.2 states that where R is defined as the increasing, 1-Lipschitz map such that R(µ) = S (µ) ∨ T (µ). Additionally to existence, strict convexity of θ ensures µ-almost sure uniqueness of T in the sense that for any optimal coupling π we have R yπ x (dy) = T (x) µ-a.s. Thus, R(µ) = T (µ) and T = R µ-almost surely.
Proof of Theorem 1.2. This is a direct consequence of Theorem 2.1, which provides existence of a map with the desired geometric properties, and Theorem 1.3, which provides the equivalence between the geometric properties and maximality.

On the reverse problem of Alfonsi, Corbetta, and Jourdain
We aim to prove Theorem 1.4 pertaining the reverse problem (1.7).
Denote the minimum in convex order of η 1 and η 2 by η. Then there exists an increasing map T * such that T * (η) = ν. If in addition, the maps are L-Lipschitz with L > 0, then the same holds true for T * .
Proof. Suppose there exist increasing maps T i and measures η i , i = 1, 2, such that T 1 (η 1 ) = ν = T (η 2 ). Then the potential function of the minimum η of η 1 and η 2 wrt the convex order is given by the convex hull of u η 1 and u η 2 . The potential function u η completely specifies the cumulative distribution function through ∂ + u η = 2F η − 1. Thus, we can find a partition of R into countably many, disjoint intervals (a k , b k ). According to this decomposition, we can define an increasing map T * via Note that T * is L-Lipschitz if T i , i = 1, 2, are L-Lipschitz. Let y ∈ R, due to the continuity of the maps T 1 and T 2 , we can find points x 1 , x 2 ∈ R with p = F ν (y), . Assume that i = 1, j = 2 with x 1 ∈ I k . If (a) holds, then we have F η (x 1 ) = F η 1 (x 1 ). Now presume that (b) holds, then F η 2 (b k −) = F η (a k ) ≤ F η 1 (x 1 ). Then Hence, by monotonicity of the map T * we conclude F T * (η) = F ν .
We finally show that ν * is minimal in the convex order as stated. By Lemma 5.1, we can assume µ ≤ c η ≤ c ν * and that η can be pushed forward onto ν via an increasing 1-Lipschitz map S . It follows by Lemma 4 and by the uniqueness obtained above we deduce η = ν * .

Stability of barycentric weak transport problems in multiple dimensions
The final part of the article is concerned with stability of the weak optimal transport problem under barycentric costs, see (1.4). Unlike in the rest of the article we work here on R d . The final aim is to prove Theorem 1.5.
One surprising aspect of this result is that we only require ν k → ν in W 1 and not necessarily in W ρ . This relates to the conditional expectation in (1.4) being 'inside of θ. ' We first prove an illuminating intermediate result: convex and satisfying the growth condition (1.8). Suppose that µ k → µ and ν k → ν in W ρ , and that η ≤ c ν. Then there exist η k ≤ c ν k such that Proof. It is well-known that (i) together with the stated convergence of the µ k 's implies (ii), so we proceed to prove the former. Let π k be an optimal coupling attaining W ρ (ν, ν k ). Let M be any martingale coupling with first marginal η and second marginal ν, the existence of which is guaranteed by the assumption η ≤ c ν together with Strassen's theorem. We convene on the notation M(dx, dy) and π(dy, dz), and define the measure P(dx, dy, dz) = M y (dx)π k y (dz)ν(dy). This measure has η, ν and ν k as first, second and third marginals. We next define R k (x) as the conditional expectation under P of the third variable given the first one, namely by the martingale property and two applications of Jensen's inequality. The desired conclusion follows.
Remark 6.2. In the context of the previous proposition, if η is supported in finitely many atoms, then the condition that ν k → ν in W ρ can be relaxed to convergence in W 1 . To wit, if η = i=1 α i δ x i , one can take ρ = 1 in (6.1) and prove The previous remark shows that we need to reduce to the finite-support setting. We carry to this in the next two lemmas: Lemma 6.3. Let η ∈ P 1 (R d ). Then for any ε > 0 there is a compactly supported, positive measuresη withη Proof. We first partition R d into countable, disjoint d-dimensional cubes (Q δ k ) k∈N of length δ > 0. Define an approximation η δ of η by Note that η δ ≤ c η and η δ → η in W 1 when δ 0. If there exists an approximation η δ such that the assertion holds, then it is straightforward to construct the corresponding measure for η, which in turn satisfies the assertion with respect to η. Wlog, we may assume that x ∈ supp(η) = R d , wherez denotes the barycenter of η. Then we can find δ > 0 such that . . , z δ n 2d span R d in the sense above and η δ (z δ n j ) = η(Q δ n j ) > 0 j = 1, . . . , 2d. For any ε > 0 there is aε ∈ (0, ε) such that . Besides, there exists a compact set K ⊆ R d such that η δ (K c ) <ε, z − K zη δ (dz) <ε and z δ n 1 , . . . , z δ n 2d ∈ K. Therefore, we find ( i=1α i < ε. If ε is chosen smaller than η δ (z δ n i ) for all i = 1, . . . , 2d, we can define theη δ viã η δ := η δ K − 2d i=1α i δ z n i . Lemma 6.4. Let µ, η ∈ P ρ (R d ) and θ : R d → R convex satisfying the growth condition (1.8). Then there exists a sequence (η k ) k∈N of finitely supported measures with η k ≤ c η, η k → η in W ρ and W θ (µ, η k ) → W θ (µ, η).
We can now prove a version of Proposition 6.1 under weaker assumptions: Lemma 6.5. Let (ν k ) k∈N be a sequence in P 1 (R d ) and let (µ k ) k∈N be a sequence P ρ (R d ) with ν k → ν in W 1 , µ k → µ in W ρ , where ρ ≥ 1, and let θ : R d → R be a convex functions satisfying the growth constraint (1.8). Then for any η ≤ c ν we find a sequence of η k ≤ c ν k such that W θ (µ k , η k ) → W θ (µ, η) and η k → η in W 1 .
If θ is strictly convex, the infimum in (6.3) is attained by a unique probability measure η k ≤ c ν k , 4 which in turn is the push-forward of µ k under a µ k -uniquely defined map T k . Moreover, the W θ -optimal transport plan π k ∈ Π(µ k , ν k ) is uniquely determined by µ k (dx)δ T k (x) (dy): Suppose the contrary and let T (x) := R d yπ k x (dy), then T (µ k ) ≤ c ν and we find the contradiction V θ (µ k , ν k ) ≤ R d θ(x − T (x))µ(dx) < R d ×R d θ(x − y)π k (dx, dy) = V θ (µ k , ν k ). Hence, by convergence of the values of V θ and tightness of (η k ) k∈N , we deduce the convergence of the η k to the optimal η ≤ c ν in W 1 . Suppose that µ k = µ for all k ∈ N, then due to the uniqueness of the optimal transport maps T between T and T (µ), we can apply Theorem [23,Corollary 5.23] and obtain convergence of the transport maps T k to T .