ON STRASSEN’S THEOREM ON STOCHASTIC DOMINATION

The purpose of this note is to make available a reasonably complete and straightforward proof of Strassen’s theorem on stochastic domination, and to draw attention to the original paper. We also point out that the maximal possible value of P ( Z = Z 0 ) is actually not reduced by the requirement Z (cid:22) Z 0 . Here, Z; Z 0 are stochastic elements that Strassen’s theorem states exist under a stochastic domination condition. The consequence of that observation to stochastically monotone Markov chains is pointed out. Usually the theorem is formulated with the assumption that (cid:22) is a partial ordering; the proof reveals that a pre-ordering su(cid:14)ces.

tention to the natural and beautiful strand of ideas that leads to the domination result. Polish spaces suffice for most purposes in probability theory, and the proof is not more complicated than that in Liggett (1985) for the compact space case. Due to the lack of highlighting of the domination result in Strassen (1965), there is a risk that too few have cared to consult the original paper. Indeed: a sample among some competent and honest colleagues revealed that the word "risk" is not really the right one... This section continues with the preliminaries and proof of the theorem; the proof of a crucial supporting result is, however, deferred to Section 4.
The new observation offered by this note, presented in Section 2, is that the maximal possible value of P(Z = Z ) is actually not reduced by the requirement Z Z . Here, Z, Z are stochastic elements that Strassen's theorem states exist under a stochastic domination condition. The consequence of that observation to stochastically monotone Markov chains is pointed out in Section 3.
Let E be a Polish space endowed with a partial ordering , i.e., a relation satisfying When (iii) is dropped, the relation is called a pre-ordering; from now on, we assume to be such an ordering.
A real-valued function f on E is said to be increasing if We let E denote the Borel σ-field on E as well as the class of real-valued functions measurable w.r.t. that σ-field; ibE is the class of functions ∈ E which are increasing and bounded.
For probability measures P and P on (E, E), P is stochastically dominated by P if We assume that the partial ordering is closed, i.e., the set is closed in the product topology on E 2 ; we abbreviate E × E, E × E to E 2 and E 2 , as is standard. Strassen's theorem on stochastic domination states that if P and P satisfy P D P , then there exists a (2) probability measureP on (E 2 , E 2 ) with marginals P and P such thatP (M ) = 1.
The random element version is the following: under the assumption of (2), with distribution P, P respectively, on some probability space, such that Z Z a.s.
The result (2) is often referred to as "Strassen's Theorem", which is formally misleading: in Strassen (1965) it is only briefly mentioned as one possible application of Theorem 11 in that paper, and the condition P D P does not appear explicitly. The crucial result of Strassen (1965), that leads to the mentioned Theorem 11 among other things, is his Theorem 7. It states that if P and P are probability measures on (E, E), Π is the class of such measures on (E 2 , E 2 ), and Λ a convex subset of Π closed with respect to weak convergence, then there exists aP ∈ Λ with marginals P, P if and only if for all continuous functions f, g on E satisfying 0 ≤ f, g ≤ 1.
To prove this theorem, which certainly has a strong appeal to intuition, it is inevitable to rely on some functional analysis. The proof, a version of Strassen's original one, is deferred to Section 4.
That Λ is indeed convex and weakly closed in Π (throughout, we use the term "weak" in the standard way of probability theory). Now assume P P . To verify (4), we pick the relevant ideas from the proof of the mentioned Theorem 11 in Strassen (1965). Let f and g be continuous functions on E, with 0 ≤ f, g ≤ 1.
We have where g * (x) = sup{g(y); x y}; this is due the facts that g ≤ g * and that g * is decreasing.

The maximal diagonal probability
For probability measures Q and Q on (E, E), we know that for any random elements Y and Y such that Y D = Q and Y D = Q , defined on some space with probability measure P; here · denotes the total variation norm for signed bounded measures on (E, E). For the difference of two probability measures, we have Q − Q = 2 · sup A∈E (Q(A) − Q (A)). The inequality (5) is the well-known basic coupling inequality, cf., e.g., Lindvall (1992), §I.2. Hence we always have But equality may be achieved in (5)-(6) using the so-called γ-coupling; cf. Lindvall (1992), §I.5. The idea is the following. Let Q 0 have density g ∧ g with respect to a reference measure λ relative to which both Q and Q are absolutely continuous, with densities g and g respectively (we may take λ = Q + Q ). Lift the sub-probability Q 0 to the diagonal ∆ = {(x, x) ∈ E 2 } using the mapping ψ : E → E 2 defined by Now Lindvall (1992), p. 19. So (6), (7) and (8) yield that if (Y, Y ) takes values on ∆ according to Q * , we have maximized P((Y, Y ) ∈ ∆). Denote the maximal possible value Now the question is: if we know that Q D Q , is it possible to sharpen (2) to the effect that the coupling has mass γ on ∆? To that end, assume γ < 1 (the case γ = 1 is trivial), and let Then are probability measures.
The crucial observation is that To prove this, let f ∈ ibE. We get Now letQ 1 be a coupling of Q 1 and Q 1 according to (2), and ThenQ is indeed a coupling of Q and Q ,Q(M ) = 1, and we have maximized the probability of ∆.
There is a theorem to formulate. We return to the setting of Section 1; the meaning of γ andP is the obvious one.
What we have achieved so far may be summed up in terms of random elements as follows. If Z D = P and Z D = P , we always have that P(Z = Z ) ≤ γ, and equality is obtained using γcoupling. That equality may be retained for a sharpening of (3): if we also have P D P , (Z, Z ) may actually be constructed such that Z Z a.s., and P(Z = Z ) = γ. (10)

An application to Markov chains
An application of Theorem 2 in Kamae et. al. (1977) to time-homogeneous Markov chains is the following. Let the assumptions established above be at force, with E endowed with a closed partial ordering . Let us assume that the Markov kernel P (x, A), x ∈ E, A ∈ E, on (E, E) is monotone in the sense that Then, for initial distribution λ and λ satisfying λ D λ , we may produce Markov chains X = (X n ) ∞ 0 and X = (X n ) ∞ 0 , both governed by P (·, ·), such that (X n , X n ), n = 0, . . . is a Markov chain and X n X n for all n ≥ 0 (12) The method is the natural one: use Strassen's theorem first to get X 0 and X 0 satisfying X 0 X 0 and with distributions λ, λ respectively, then repeatedly for n = 0, 1, . . . That is, after n steps, n = 0, . . . , we have obtained Another application of Strassen's theorem using (11) renders X n+1 X n+1 , and (12) is set by induction.
Using the theorem of Section 2, we find that X and X may be constructed, in addition to having the property (12), such that for all x x and for all n = 0, . . . , and that probability can never be strictly larger, under the condition that (X n , X n ), n = 0, . . . is a Markov chain.
Also: we noticed in Section 2 that (12)-(13) hold just as well in a pre-ordered space.
We content ourselves with this sketchy account. At the side of Kamae et. al. (1977), see also Lindvall (1992), §IV.2, for Markov chain domination.

The proof of Strassen's Theorem 7
For a Banach space B with norm · , the dual space B * is the set of continuous linear functionals on B. The weak * topology on B * is defined as the weakest one such that for all z ∈ B the mapping L → Lz is continuous. The weak * -topology renders the linear space B * locally convex. We may use (f, g) = |f | + |g| as norm on B, where | · | is the supremum norm for functions on E.
In (4), the restrictions 0 ≤ f, g ≤ 1 were introduced to facilitate the estimates that followed; they are now dropped without effect on the problem. So from now on, f and g denote any bounded continuous functions.
Notice that such a pair defines a linear functional ∈ B * through the definition
Due to the convexity assumption on Λ, we have that H Λ is convex in B * . The reader is now deferred to Rudin (1973), Theorems 3.4(b) and 3.10, and §3.14, for proofs of the following: Then find (f 0 , g 0 ) representing T 0 according to (i). This means But this is just another formulation of This means that in order to prove that H Λ is weak * closed, it suffices to show that the limit of a convergent sequence of elements in H Λ still is in H Λ . For that, let [µ n , µ n ], n ≥ 1, be a sequence in H Λ which is weak * convergent to [P, P ]. We adopt the ⇒ notation for weak convergence of sequences of probability measures: cf. Billingsley (1968), where all the standard theory needed is easily found.
We have µ n ⇒ P and µ n ⇒ P . This implies that any sequence (λ n ) ∞ 1 such that λ n has marginals µ n , µ n for n ≥ 1 is tight. Indeed, take any > 0 and let K , K be compact sets such that inf n µ n (K ) > 1 − /2 and inf n µ n (K ) > 1 − /2.
It is immediate that inf n λ n (K × K ) > 1 − , and K × K is compact in E 2 . Now apply Prohorov's theorem to find a cluster point λ of (λ n ) ∞ 1 . Since Λ is closed, we have λ ∈ Λ; that λ has marginals P and P follows from the continuous mapping theorem, applied to the mappings (x, x ) → x and (x, x ) → x .
It seems as Strassen's domination theorem has an easy proof only when E is either finite or is a subset of the real line.