A note on the adapted weak topology in discrete time

The adapted weak topology is an extension of the weak topology for stochastic processes designed to adequately capture properties of underlying filtrations. With the recent work of Bart--Beiglb\"ock-P. as starting point, the purpose of this note is to recover with topological arguments the intriguing result by Backhoff-Bartl-Beiglb\"ock-Eder that all adapted topologies in discrete time coincide. We also derive new characterizations of this topology including descriptions of its trace on the sets of Markov processes and processes equipped with their natural filtration. To emphasize the generality of the argument, we also describe the classical weak topology for measures on $\mathbb R^d$ by a weak Wasserstein metric based on the theory of weak optimal transport initiated by Gozlan-Roberto-Samson-Tetali.


I
An essential difference in the study of random variables and stochastic processes is that the latter comes in conjuction with filtrations that are designed to model the flow of available information: Let us consider a path space X := N t=1 X t equipped with the product topology where (X t , d X t ) are Polish metric spaces and N ∈ N denotes the number of time steps.We write P(X) for the set of laws of stochastic processes, i.e., Borel probability measures on X. Canonically, we identify the law P ∈ P(X) with the process X, σ(X 1:t ) N t=1 , σ(X), P, X , where X = X 1:N is the coordinate process on X, X 1:t denotes the projection from X → t t=1 X s =: X 1:t , and σ(X 1:t ) the σ-algebra generated by X 1:t .For P, Q ∈ P p (X), that are probabilities in P(X) with finite p-th moment, p ∈ [1, ∞), the p-Wasserstein distance W p is given by where Cpl(P, Q) denotes the probabilities on X × X with marginals P and Q, and d p X (x, y) := N t=1 d p X t (x t , y t ).We equip P p (X) with the topology induced by W p and note that if d X is bounded, W p metrizes the weak topology on P(X).
The starting point for the study of adapted topologies poses the fact that probabilistic operations and optimization problems that crucially depend on filtrations, such as the Doob decomposition, the Snell envelope, optimal stopping, utility maximization, and stochastic programming, are typically not continuous w.r.t.weak topologies.These shortcomings are acknowledged by several authors from different communities, see e.g.[1,14,17,2,7] for more details.The purpose of this note is to recover and strengthen the main result of Backhoff et al. [5] that all adapted ETH Zurich, Switzerland, gudmund.pammer@math.ethz.ch.topologies on P(X) coincide.In comparison to the original proof, our argument is more conceptional: at its core lies the elementary fact that comparable compact Hausdorff topologies agree.
1.1.Stochastic processes and the adapted weak topology.Subsequently, we want to consider topologies that incorporate the flow of information encoded in filtrations, for processes on general filtered probability spaces.Therefore, we follow the approach of [7] by introducing the notion of a filtered process.
Definition 1 (Filtered process).A filtered process X with paths in X is a 5-tuplet consisting of a complete filtered probability space (Ω X , (F X t ) N t=1 , F X , P X ) and an (F X t ) N t=1 -adapted stochastic process X with paths in X.We write FP for the class of all filtered processes with paths in X, and FP p for the subclass of filtered processes that finitely integrate d p X ( x, X) for some x ∈ X. Although, a-priori FP is a proper class (that contains a lot of redundancy), in the following we will consider equivalence classes [X] of filtered processes in the sense of Hoover-Keisler [14] such that the corresponding factor space FP becomes a set, see for example [4].This factorization can be seen similarly as in classical L ptheory where one considers equivalence classes modulo almost-sure equivalence in order to obtain a Banach space.This equivalence relation can be characterized by an adapted version of the Wasserstein distance, c.f. [7, Theorem 1.5], the adapted Wasserstein distance AW p which will be introduced in detail in Section 1.2 below: for X, Y ∈ FP p we have Henceforth, we consider the factor space FP and remark that equivalent processes share the same probabilistic properties, e.g.being adapted, having the same Doob decomposition and Snell-envelope, . . . .Moreover, we write FP p for those elements The topology induced by the adapted Wasserstein distance is denoted by τ AW and called the adapted weak topology.When equipping FP with the adapted weak topology, we obtain a space rich of topological and geometric properties, see [7].Importantly, we note that as a consequence of the adapted block approximation introduced in [7] the values of AW p (X, Y) (and also CW p (X, Y) which will be introduced down below) is independent of the particular choice of representatives.Similarly, we can equip FP p with p-th Wasserstein topology by letting and remark that W p is not point seperating on FP p .Processes can have the same law, but very different information structure, see for instance [2, Figure 1].An important feature of AW p is the following Prokhorov-type result which will be applied at several occasions in the proofs: To emphasize the significance of Theorem 2 and to give the idea behind the main results, we formulate the following immediate corollary: Then d metrizes the adapted weak topology τ AW .
Indeed, by (4) we find On the other hand, to deduce the reverse implication of (5), let (X k ) k∈N be a dconvergent sequence with limit X.Then the sequence is W p -precompact and therefore AW p -precompact by Theorem 2. Therefore, there exist Y ∈ FP p and a subsequence with lim j→∞ AW p (X k j , Y) = 0.By (5) this sequence also converges w.r.t.d, thus, the triangle inequality yields d(X, Y) = 0. Finally, as d is a metric, we get that X = Y and thus lim k→∞ AW p (X k , X) = 0.
1.2.Adapted topologies.In order to capture the properties of filtrations, numerous authors have introduced extensions of the weak topology of measures on P(X), which we frame in our setting and briefly introduce below.For a thorough overview of the topic and introduction to those topologies we refer to [5] and the references therein.
(A) Aldous [1] introduces the extended weak topology τ A by associating a process X ∈ FP with a measure-valued martingale pp 1 (X), the so-called prediction process, that is here where L (X|F X t ) is the conditional law of X given F X t .Then τ A is defined as the initial topology induced by X → L (pp 1 (X)) when P(P(X) N ) is equipped with the weak topology.(HK) Hoover-Keisler [14] introduce an increasing sequence of topologies τ r HK on FP where r ∈ N ∪ {0, ∞} is called the rank.This is achieved by iterating Aldous' construction of the prediction process.Set pp 0 (X) := X and, recursively define, for r ∈ N ∪ {∞}, and pp(X) := pp ∞ (X).Analogously to (A), for r ∈ N ∪ {0, ∞}, τ r HK is given by the initial topology w.r.t.X → L ((pp k (X)) r k=0 ).We remark that τ 0 HK is equivalent to weak convergence of the law, τ 1 HK = τ A , and τ N−1 HK = τ r HK for r ≥ N (see [7]) and simply write then τ HK := τ N−1 HK .(OS) The optimal stopping topology τ OS is defined in [5] as the initial topology w.r.t. the family of maps where c : {1, . . ., N}×X → R is continuous, bounded, and non-anticipative, that is c(t, x) = c(t, y) if x 1:t = y 1:t for (t, x), (t, y) ∈ {1, . . ., N} × X. (H) The information topology τ H of Hellwig [13] is based on a similar point of view as (A) and (HK).Properties of the filtration are encoded in the laws that are measures on P(X 1:t × P(X t+1:N )).
(BLO) Let the path space X be the N-fold product of a separable Banach space V, i.e., X = V N .In this setting, Bonnier-Liu-Oberhauser [9] embed FP into graded linear spaces V r via higher rank expected signatures, where r ∈ N ∪ {0, ∞} is again the rank, and define τ r BLO as the initial topology w.r.t. the corresponding embedding Φ r : FP → V r .Remark 4. In case that d X is an unbounded metric on X, we will fix for the rest of the paper p ∈ [1, ∞) and consider the subset FP p with the following topological adaptation.The topologies (A), (HK), (OS), (H) and (BLO) are then refined by additionally requiring continuity of To avoid notational excess, we state all results on FP p for some p ∈ [1, ∞).All results are also true when replacing FP p with FP (and if necessary d X with, for example, d X ∧ 1).
Besides using the powerful concept of initial topologies, various authors have constructed adapted topologies based on ideas from optimal transportation.The essence of this approach is to encode filtrations into constraints for the set of couplings and thereby construct modifications of the Wasserstein distance suitable for processes.To illustrate the idea, recall that optimal transport has so-called transport maps T : X → X at its core, satisfying the push-forward condition T # P = Q for P, Q ∈ P(X).We refer to [19] for a comprehensive overview on optimal transport.In our context, where P and Q are laws of processes, causal optimal transport suggests to use adapted maps in order to transport P to Q, i.e., T # P = Q and T is non-anticipative, which means When X resp.Y denote the first resp.second coordinate projection from X×X → X, then this additional adaptedness constraint on couplings can be formulated as where, for σ-algebras A, B, C on some probability space, A ⊥ B C denotes conditional independence of A and C given B. Elements of Cpl c (P, Q) are called causal couplings.When one symmetrices (11) one obtains the set of bicausal couplings Cpl bc (P, Q), that are π ∈ Cpl c (P, Q) such that (Y, X) # π ∈ Cpl c (Q, P).These definitions can be easily extended to FP.

Definition 5 (Causal and bicausal couplings
We call π bicausal if it additionally satisfies Finally, we write Cpl c (X, Y) resp.Cpl bc (X, Y) for the set of causal resp.bicausal probabilities with first marginal P X and second marginal P Y .
(SCW) Lassalle [16] and Backhoff et al. [6] coin the notion of causality, see Definition 5, and introduce the causal Wasserstein "distance" CW p on P p (X).For X, Y ∈ FP p we have Clearly, CW p is not a metric as it lacks symmetry, which motivates to consider the so-called symmetrized causal Wasserstein distance, see [5], which constitutes a metric on FP p .We write τ SCW for the induced topology.(AW) Instead of symmetrizing as in (15), one can directly symmetrize the definition on the level of couplings via the notion of bicausal couplings.Approaches in this spirit but to different extents go back to Rüschendorf [18], Pflug-Pichler [17], Bion-Nadal-Talay [8], and Bartl et al. [7].We define the adapted Wasserstein distance of X, Y ∈ FP p by The adapted Wasserstein distance is a metric on FP p and we denote its induced topology by τ AW .(CW) Finally, we introduce here a new mode of convergence, the so-called topology of causal convergence τ CW , which we describe below: A neighbourhood basis of X ∈ FP p is given by where ǫ > 0. Hence, τ CW can be equivalently described by Remark 6.It is apparent from the definitions in (2), ( 14), ( 15) and ( 16) that for X, Y ∈ FP p .Hence, we have τ W ⊆ τ CW ⊆ τ S CW ⊆ τ AW .
1.3.Characterizations of the adapted weak topology.In this subsection we formulate the main results of this paper.The core ingredient in order to prove the main results, Theorems 8 and 11, and also Proposition 12, is the following simple observation of topological nature.
(2) The topology τ is a least as fine as Note that Lemma 7 in combination with Theorem 2 have Corollary 3 as a consequence.
Next, we provide characterizations of the adapted weak topology on FP p .The equivalence of τ HK and the adapted Wasserstein-topology, τ AW , is due to [7] whereas the characterization in terms of the symmetric causal Wassersteintopology, τ SCW , is novel.Moreover, we remark that the equivalence of the higher rank expected signature-topology, τ BLO and τ HK was already known when, for t ∈ {1, . . ., N}, X t = V and V is a compact subset of a separable Banach space, see [9,Theorem 2].Theorem 8. On FP p we have , then these topologies also coincide with τ N−1 BLO , and τ r BLO = τ r HK .When restricting to sets of processes that have a simpler information structure, e.g.Markov processes or processes equipped with their natural filtration, there are simpler ways to characterize the adapted weak topology.This motivates the next definition of higher-order Markov processes where the transition probabilities are allowed to depend on more than its current state.Definition 9. Let n ∈ N ∪ {∞}.We call a process X ∈ FP p n-th order Markovian (or n-th order Markov process) if, for all The set of all n-th order Markov processes is denoted by . Moreover, we may call ∞-th order Markov processes plain and write FP

with the initial topology τ n
Markov that is given by the maps Remark 10.To illustrate Definition 9, let n = 1.Clearly, FP Markov p,1 is the subset of (time-inhomogeneous) Markov processes in FP p .A family of Markov processes (X k ) k∈N converges to a Markov process X w.r.t.τ 1  Markov if and only if, for 1 In particular, if there exist continuous kernels κ t : X t → P p (X t+1 ) which satisfy κ t (X t ) = L (X t+1 |X t ) almost surely, then convergence in τ 1 Markov can be characterized by the following: for all 1 ≤ t ≤ N − 1 and ǫ > 0, and some x ∈ X.This can be easily deduced, e.g., by using continuity of the kernels (κ t ) N−1 t=1 and Skorokhod's representation theorem.The next result recovers and generalizes the main result of [5].The novelty of the next result is two-fold: On the one hand, the case n = ∞ recovers the results of [5] and additionally gives a new description in terms of τ ∞ Markov .On the other hand, the case n ∈ N extends this result to the subset of n-th order Markov processes.
Theorem 11 (All adapted topologies are equal).Let n, r ∈ N ∪ {∞}.Then the trace on FP Markov p,n of the topologies τ A , τ r HK , τ OS , τ H , τ CW , τ SCW and τ AW are the same.In particular, they all coincide with the trace of τ n Markov .
1.4.Characterization of the weak topology.The line of reasoning prescribed by Lemma 7 can be utilized outside of the framework of the adapted weak topology which is demonstrated by the proposition below.
Proposition 12.The p-Wasserstein topology on P p (R d ) can be metrized by where R d is equipped with the euclidean norm • and Remark 13.The minimization problem depicted in ( 27) is a so-called weak optimal transport problem [12], which is a generalization of optimal transport.In particular, (27) vanishes if and only if there exists a martingale coupling between P and Q.For more details we refer to [11,3] and the references therein.

P
In order to prove the main results, we will verify the assumptions of Lemma 7. By doing so, we will encounter variuous martingales which can be properly treated thanks to the next well-known fact.We recall that a process X = (X t ) N t=1 taking values in P(X) is called a measure-valued martingale with values in P(X) if, for f ∈ C b (X), the real-valued, bounded process (X t ( f )) N t=1 is a martingale.Here, we write p( f ) for the integral f dp when p ∈ P(X) and f ∈ C b (X).Lemma 14.Let X 1 , X 2 , X 3 be measure-valued martingale taking values in P(X) where X is a Polish space.If X 1 ∼ X 3 , then X 1 = X 2 = X 3 almost surely.
Proof.Since there exists a countable family in C b (X) that separates points in P(X), it suffices to show that for f ∈ C b (X) As X 1 , X 2 , X 3 is a measure-valued martingale, we have that Y i := X i ( f ) is a realvalued, bounded martingale and

p,n
. First, we justify that the n-Markov property is preserved under equivalence.
Proof.By Definition 9 the property of being n-Markovian can be deduced from observing the law of the corresponding first-order prediction process.Hence, we conclude by the fact that X ≡ Y readily implies Proof.By definition of a filtered process Y is adapted, therefore the coupling π, given by (id Ω Y , Y) # P Y , is causal from Y to X := (X, σ(X 1:t ) t , σ(X), L (Y), X), where X denotes the canonical process on X.
If Y is plain, c.f. Definition 9, then L (Y|F Y t ) = L (Y|Y 1:t ) P Y -almost surely.Again, as X is adapted, this translates to the following conditional independence are uniquely defined by their law, that is, for Proof.The first claim is a direct consequence of the definition of n-th resp.mth order Markov processes.The second claim then readily follows from Lemma 16.

Lemma 18. (FP
Markov ) is a sequential Hausdorff space.Proof.First, we remark that, for 1 ≤ t ≤ N − 1, the map X → L (T n t (X)) takes values in the Polish (and therefore first countable) space and the existence of a measurable map f t : X 1∨(t−n+1):t → P(X t+1 ) such that almost surely In particular, we have for t = n that L (X 1:n+1 ) = L (Y 1:n+1 ).We proceed to show L (X) = L (Y).Assume that we have already shown for some n + 1 ≤ t ≤ N − 1.By the disintegration theorem and the definition of n-th order Markovian, we may write where we use the notation µ ⊗ k for µ ∈ P(X 1:t ) and a measurable kernel k : X 1:t → P(X t+1 ) to denote the gluing of µ with k, that is the probability defined by This concludes the inductive step.
Finally, we can apply Lemma 16 and conclude X = Y.
By [7] we may assume w.l.o.g. that F X N = F X , F Y N = F Y , and (Ω X , F X ) and (Ω Y , F Y ) are standard Borel spaces.This allows us to consider the conditionally independent product of π and π ′ denoted by π := π ⊗π ′ ∈ Cpl(X, Y, X), see Definition 22. Here, Cpl(X, Y, X) denotes the set of coupling with marginals P X , P Y and P X .We write X and X for the second X-coordinate in order to distinguish them.By induction we show that π-almost surely for all k ∈ N ∪ {0}.Since we know that X = Y = X π-almost surely, we have verified (28) for k = 0. Assume that (28) holds for some k.By causality of π ′ and Lemma 24 we find, for where naturally extend the notation introduced in Definition 5 in order to write products of multiple σ-algebras.Since pp k (X) is F X,Y, X N,0,0 -measurable and pp k (Y) is F X,Y, X N,N,0 -measurable, we obtain by combining (28), (29), and the tower property and similarly, Hence, the triplet (pp k+1 t ( X), pp k+1 t (Y)), pp k+1 t (X)) satisfies the assumptions of Lemma 14, which concludes the inductive step.In particular, we have shown that pp(X) ∼ pp(Y), whence X = Y by [7,Theorem 4.11].
. First, we convince ourselves that (L (X k )) k∈N converges to L (X): Assume that we have already shown that L (X k 1:t ) → L (X 1:t ) for some 1 ≤ t ≤ N − 1.The conditionally independent product ⊗, see [10, Definition 2.8], allows us to rewrite By [10,Theorem 4.1], that is in our context continuity of ⊗ at (L (X 1:t ), L (T n t (X))), we obtain that L (X k 1:t+1 ) → L (X 1:t+1 ).Hence, (L (X k )) k∈N is convergent and therefore tight.Thus, there exists by Theorem 2 a subsequence of (X k ) k∈N converging in τ AW to some Y ∈ FP p .Due to τ AW -continuity, we get Hence, there exist measurable maps f t : X 1∨(t−n+1):t → P(X t+1 ) with the property In other words, Y ∈ Λ n,Markov .Therefore the sequence (X k ) k∈N is also relatively compact in (FP and conclude with Lemma 16 that Y(= X) ∈ FP plain p .
2.2.Causal gluing.This section is devoted to develop auxiliary results concerning the composition of causal couplings with matching intermediary marginal.We recall that due to [7] we can always assume w.l.o.g. that all spaces under consideration are standard Borel.Therefore, we assume for the rest of the section that we have chosen representatives of X, Y, Z ∈ FP such that Definition 22.Let γ ∈ Cpl(X, Y) and η ∈ Cpl(Y, Z).We define the conditionally independent product of γ and η as the probability on satisfying for any U, bounded and F X,Y,Z N,N,N -measurable, that where η ω Y is a disintegration kernel of η w.r.t. the projection on Ω Y .Due to symmetry reasons, we have The term (31) clarifies the naming of γ ⊗η as the conditional independent product: conditionally on ω Y the knowledge of ω X does not affect ω Z and vice versa.This suggests the following probabilistic formulation.
Proof.Let U, V, W be bounded and F X,Y,Z N,0,0 -measurable, F X,Y,Z 0,N,0 -measurable, and F X,Y,Z 0,0,N -measurable, respectively.Write Ŵ for the bounded, F X,Y,Z 0,N,0 -measurable random variable given by W(ω Z ) η ω Y (dω Z ).By Definition 22 and the tower property we get 0,N,0 ] and V was arbitrary, we derive , which shows the first statement.
The second statement is a consequence of applying [15,Proposition 5.8] to the previously shown.
Lemma 24.Let γ ∈ Cpl(X, Y) and η ∈ Cpl c (Y, Z).We have, for Proof.To show item (1), let W be bounded and F X,Y,Z 0,t,t measurable.We obtain from Lemma 23 the first equality in whereas the second stems from causality of η.Here this causality yields under γ ⊗η that, conditionally on F X,Y,Z 0,t,0 , F X,Y,Z 0,N,0 is independent of F X,Y,Z 0,t,t .Since the last term in (32) is F X,Y,Z t,t,0 -measurable, the tower property yields item (1).To establish item (2), let W be as above.Note that causality of γ provides under γ ⊗η that, conditionally on F X,Y,Z t,0,0 , F X,Y,Z N,0,0 is independent of F X,Y,Z t,t,0 .Using that in addition to item (1) and the tower property, we conclude Proof.This result is a direct consequence of item (2) of Lemma 24.
Lemma 26.Let X ∈ FP p .The map by Corollary 25.Hence, we compute

Postponed proofs of Section 1.
Proof of Lemma 7. Due to (2) it remains to show that convergence in (A, τ ′ ) implies convergence in (A, τ).To this end, let (y k ) k∈N be a sequence in (A, τ ′ ) converging to y.By (3) we find a subsequence (y k j ) j∈N that converges in (A, τ) to some element z.Again, by (2) we have that (y k j ) j∈N also converges in (A, τ ′ ) to z, which yields by (4) that y = z.Therefore, y is the only (A, τ)-accumulation point of (y k ) k∈N , from where we conclude that (y k ) k∈N has to converge to y in (A, τ).Proposition 8], where τ W is the topology of p-Wasserstein convergence of the laws.Since τ W and τ have the same relatively compact sets by Theorem 2, we conclude the same for τ ′ .Hence, all assumptions of Lemma 7 are met which yields the first two assertions of the theorem.
The last assertion of the theorem follows mutatis mutandis.Markov is coarser than τ H , τ A , τ r HK , τ OS , τ AW and τ SCW .Similarly, we have that all of these topologies are coarser than τ AW .We remark that τ AW ⊇ τ OS can be seen due to the fact that the map which maps X ∈ FP p to its Snell envelope is τ AW -continuous.
Proof of Proposition 12. Let A = B = P p (R d ) and τ = τ W .It is straightforward to check that V p is a pseudometric and V p ≤ W p .Moreover, as a simple consequence of Lemma 14 we find that V p separates points: If V p (P, Q) = 0 then there exist martingale couplings π ∈ Cpl(P, Q) and π ∈ Cpl(Q, P).Let X = (X t ) 3 t=1 be a Markov process with (X 1 , X 2 ) ∼ π and (X 2 , X 3 ) ∼ π ′ .Therefore, X is a martingale and by Lemma 14 X 1 ∼ X 2 , that is P = Q and V p is a metric on P p (R d ).We write τ V for the topology induced by V p and get τ V ⊆ τ W .It remains to verify Item (3) of Lemma 7.
To this end, let (P k ) k∈N converge to P in τ V and we want to show W p -relative compactness of the sequence.By [3, Lemma 6.1], we have V p (P k , P) = inf where ≤ cx denotes the convex order on P 1 (R d ).Recall that, for µ, ν ∈ P 1 (R

Corollary 17 .
which means that π is bicausal andAW p (X, Y) = 0.For n, m ∈ N ∪ {∞} with n ≤ m we have FP Markov p,n ⊆ FP Markov p,m .Moreover, processes in FP Markov p,n