On the notion(s) of duality for Markov processes

We provide a systematic study of the notion of duality of Markov processes with respect to a function. We discuss the relation of this notion with duality with respect to a measure as studied in Markov process theory and potential theory and give functional analytic results including existence and uniqueness criteria and a comparison of the spectra of dual semi-groups. The analytic framework builds on the notion of dual pairs, convex geometry, and Hilbert spaces. In addition, we formalize the notion of pathwise duality as it appears in population genetics and interacting particle systems. We discuss the relation of duality with rescalings, stochastic monotonicity, intertwining, symmetries, and quantum many-body theory, reviewing known results and establishing some new connections.


1 Introduction
Duality of Markov processes with respect to a duality function has first appeared in the literature in the late 40s and early 50s [Lev48,KMG57,Lin52], and has been formalized and generalized over the following decades [Sie76,HS79,CR84,CS85,EK86,Ver88,SL95]. It has since been applied in a variety of fields ranging from interacting particle systems, queueing theory, SPDEs, and superprocesses to mathematical population genetics. In spite of this wide interest and applicability, there are so far few attempts at developing a systematic theory of duality of Markov processes with respect to a function, unlike for the different although related notion of duality with respect to a measure, for which there exists a rather complete theory, see [CW05,Ge10] for recent surveys. Overviews of the method of duality with respect to a function generally focus on certain aspects or applications to particular fields [DG14,Lig05,EK86,DF90,Möh99,S00,Asm03], and presentations of the manifold connections to fundamental structures or properties of Markov processes, such as time reversal, stochastic monotonicity, symmetries, or conserved quantities, are often restricted to specific problems. The interest in a general theory of duality has further increased in recent years, but even basic questions such as giving necessary and sufficient conditions for the existence of a dual process of a given Markov process have not yet been fully resolved -"finding dual processes is something of a black art" [Eth06,p. 519]. It has however seen substantial development for example in the work by Möhle, illuminating relations with symmetries and conserved quantities [Möh99,Möh11], and by Giardina, Redig, Kurchan and Vafayi [GKRV09] which presents a deep connection with symmetries and representations of Lie algebras, using quantum mechanics formalisms. On the other side, the lookdown construction [DK96,DK99] has triggered new and powerful applications. However, a unified treatment of the theory presenting fundamental connections is still missing.
The present paper is on the one hand a survey of the notion of duality of Markov processes with respect to a function, and on the other hand also presents new results in this field. A particular focus is on the question of existence and uniqueness of dual processes (Section 3). These are formulated in functional analytic language on the level of Markovian semi-groups, and relate the problem to the invariance of certain (convex) sets and to the existence of certain unique integral representations via the concept of cone duality. We also formalize the notion of pathwise duality, which is of particular importance in applications (Section 4). Moreover, connections with time reversal (duality with respect to a measure, Section 2), stochastic monotonicity (Section 5), intertwinings and symmetries (Section 6) are discussed.
The aim of the paper is to give an overview of the theoretical background of the concept of duality of Markov processes, and to present fundamental connections in a unified way. We hope that it provides a reference for probabilists applying duality techniques to various situations, who might be interested in the fundamental principles of this theory. We also try to assist understanding of certain results from mathematical physics, Hilbert space theory or quantum mechanics for researchers who might not be familiar with the jargon of these fields. Last but not least, we hope that this article triggers new research in this multifaceted and widely applicable area of probability theory.

Setting and definitions
In the following, X and Y are Markov processes with stationary transition probabilities and state spaces E and F . The state spaces are assumed to be Polish and are endowed with the Borel σ-algebras B(E) and B(F ). For our purpose a Markov process with state space E is a collection X = (Ω, F, (X t ) t≥0 , {P x } x∈E ) consisting of a measure space (Ω, F), measurable maps X t : (Ω, F) → (E, B(E)), and probability measures P x on (Ω, F) such that: • for all x ∈ E, X 0 = x, P x -a.s.; • for every Borel-measurable bounded function f : E → R, the map x → P x f (X t ) is Borelmeasurable as well; • the process satisfies the Markov property with respect to the natural filtration F 0 t = σ(X s , 0 ≤ s ≤ t). We do not assume that the strong Markov property holds, and unless explicitly stated otherwise, we do not assume any regularity of the sample paths. Thus our processes have less structure than commonly assumed in the theory of Markov processes, as exposed in classical textbooks such as [Dyn65,EK86,RW]. The reason is that we will only need properties determined by the finite-dimensional distributions. The basic concept we are interested in is the following: Definition 1.1 (Duality with respect to a duality function). Let X = (Ω 1 , F 1 , (X t ) t≥0 , {P x } x∈E ) and Y = (Ω 2 , F 2 , (X t ) t≥0 , {P y } y∈F ) be two Markov processes with respective state spaces E and F, and H : E × F → R a bounded, measurable function. Then X and Y are dual with respect to H if and only if for all x ∈ E, y ∈ F and t ≥ 0 (1) Remark. Here and throughout this paper we assume boundedness of the duality function H for the sake of simplicity of the exposition. Of course duality can in principle be defined for suitable unbounded functions as well.
From now on we drop the explicit mention of the underlying measure spaces and simply speak of Markov processes (X t ) and (Y t ). Throughout P x refers to the law of X started in x and P y to the law of the dual process started in y.
Let (P t ) t≥0 and (Q t ) t≥0 denote the semi-groups of (X t ) and (Y t ), respectively, that is, P t f (x) = E x f (X t ), and similarly for Q t . A different way of writing the duality formula is P t H(·, y)(x) = Q t H(x, ·)(y). (2) Note that duality implies that for every t, P t H(·, y)(x) = P s [P t−s H(·, y)](x) = P s [Q t−s H(·, ·)(y)](x), 0 ≤ s ≤ t, where in the first equality we have used the Chapman-Kolmogorov equation, and in the second the duality property. Note that P t always acts of H as a function of the first, Q t as a function of the second variable. Assume now that (X t ) and (Y t ) have generators L X and L Y with domains D(L X ) and D(L Y ) respectively, and that H(x, ·) ∈ D(L Y ), H(·, y) ∈ D(L X ). Equation (2) then implies L X H(·, y)(x) = L Y H(x, ·)(y) x ∈ E, y ∈ F.
The converse is true as well, under certain conditions: Proposition 1.2. Let (X t ), (Y t ) be Markov processes with generators L X , L Y , let H : E × F → R be bounded and continuous. If H(x, ·), P t H(x, ·) ∈ D(L Y ) for all x ∈ E, t ≥ 0 and H, (·, y), Q t H(·, y) ∈ D(L X ) for all y ∈ F, t ≥ 0, and if L X H(·, y)(x) = L Y H(x, ·)(y) ∀x ∈ E, y ∈ F, then (X t ) and (Y t ) are dual with respect to H.
Proof. Let u 1 (t, x, y) := E x (X t , y) = P t H(x, y), and u 2 (t, x, y) := E y (x, Y t ) = Q t H(x, y). Note first that by Fubini P t Q s H(x, y) = Q s P t H(x, y) for s, t ≥ 0, x ∈ E, y ∈ F, therefore, by our assumptions, P t L Y H(x, ·)(y) = L Y P t H(x, ·)(y). Using the Kolmogorov forward equation and (4), we get d dt u 1 (t, x, y) =P t L X H(·, y)(x) = P t L Y H(x, ·)(y) = L Y u 1 (t, x, y).
Since also d dt u 2 (t, x, y) = L Y u 2 (t, x, y) and u 1 (0, x, y) = H(x, y) = u 2 (0, x, y) for all x ∈ E, y ∈ F, the claim follows from the uniqueness of the solution of the initial value problem associated with L Y (see [Dyn65,Thm. 1

.3]).
There is another notion of duality. In Section 2 we shall see that we can think of it, roughly, as a specialisation of the previous definition to diagonal duality functions. The definition is usually given in terms of the resolvent R λ (x, A) := ∞ 0 exp(−λt)P t (x, A)dt, and allows processes with a finite life-time corresponding to sub-Markov semi-groups. Definition 1.3 (Duality with respect to a measure). Let (X t ), (Y t ) be two (sub)-Markov processes with common state space E, semi-groups (P t ), (Q t ), and resolvents R λ ,R λ . Then (X t ) and (Y t ) are said to be in duality with respect to the σ-finite measure µ if (i) for all λ > 0 and all non-negative f, g ∈ L ∞ (E), and (ii) the resolvents are absolutely continuous with respect to µ, R λ (x, ·) ≪ µ(·),R λ (y, ·) ≪ µ(·) for all λ > 0 and x, y ∈ E. If only (i) holds, then (X t ) and (Y t ) are said to be in weak duality with respect to µ.
If (P t ) = (Q t ) and Eq. (5) holds, then µ is called a symmetrizing measure for (X t ). When (X t ) and (Y t ) have right-continuous paths, we can replace the resolvents in eq. (5) by the semi-groups.
This type of duality will not be in the focus of this paper, and we refer the reader to [BG68,CW05,Ge10] and references therein for detailed accounts of this theory. Remark (Time reversal). When µ is a probability measure and (X t ) and (Y t ) are Markov processes (not only sub-Markov), the previous notion coincides with the usual notion of time reversal with respect to a probability measure µ. Similarly, in this case a symmetrizing probability measure is the same as a reversible measure. Remark (Feynman-Kac corrections). Definition 1.1 can be generalized in the following way. Let H : E 1 × E 2 → R, F : E 1 → R, G : E 2 → R be bounded, measurable such that T 0 |F (X t )|dt < ∞, T 0 |G(Y t )|dt < ∞, for T > 0. We say that (X t ), (Y t ) are dual with respect to (H, F, G), if for every x ∈ E 1 , y ∈ E 2 , and (see [EK86] for a discussion of such dualities in the context of Martingale problems). In the present article, however, we will not use this more general definition, and restrict ourselves to duality in the sense of (1.1).

Examples
We now give some examples of dual processes and typical duality functions, and hint at some applications of this concept. The list of examples is very far from being exhaustive. It is meant as a motivating illustration of the wide use of duality before addressing more theoretical questions. In order to keep the exposition short, we will restrict ourselves to simple examples that don't necessitate much notation, as for example interacting particle systems, and we don't try to find the most general setting for the various types of duality functions.
Example 1.1 (Siegmund duality). Assume E = F , and let ≤ be a partial order on E. For x, y ∈ E let H(x, y) := 1 {x≤y} . Two processes (X t ), (Y t ) on E are dual with respect to this duality function if and only if Of course, exchanging the roles of X and Y we could equivalently choose H(x, y) = 1 {x≥y} . This is a classical duality and occurs in many contexts. For example, it was observed by Lévy (1948) [Lev48], that Brownian motion reflected at 0 and Brownian motion absorbed at 0 are dual with respect to this duality function. It was applied in different fields such as of queuing theory, [Lin52,Asm03], birth and death processes [KMG57], or interacting particle systems [CS85]. Siegmund [Sie76] proved that it holds for stochastically monotone Markov processes (cf. Section 5). It is therefore sometimes called "Siegmund duality", a name that we will also adopt throughout this paper for duality with respect to 1 {x≤y} or 1 {x≥y} . This type of duality is related to time reversal in a sense that it reverses the role of entrance and exit laws [CR84], and to other forms of duality such as Wall duality [DFPS], or strong stationary duality [DF90,Fil92] that will not be discussed here in detail.
Example 1.2 (Coalescing dual, interacting particle systems). Duality has found a particularly wide application in the field of interacting particle systems [Spi70,HL75,Lig05,Gri79,Har78]. We focus here on a simple setup. Consider E = {0, 1} G for some graph G. A partial order on E is given by Let (X t ) denote the voter model on G, that is the interacting particle system where a particle at site i flips from 0 to 1 with rate given by the number of type 1 neighbours, and from 1 to 0 at rate given by the number of type 0 neighbours. It is well-known that the voter model is dual to a system of coalescing random walks in the following way: Let F = {A : A is a finite subset of G}. We consider the Markov process (A t ) with values in F with dynamics such that each i ∈ A is removed at rate 1 and replaced by one of its neighbours j ∈ G if j / ∈ A. If j ∈ A, then the particles coalesce, that means, i is removed from A, and j remains. Then (X t ) and (A t ) are dual with respect to see [HS79]. Identifying (X t ) and (B t ) with B t = {i ∈ G : X t (i) = 1}, this can be written as H(A, B) = 1 {A∩B=∅} . Conversely (A t ) can as well be interpreted as a particle system (Y t ), taking values in E, by setting Y t (i) = 1 {i∈At} . Then the duality function becomes where x ∧ y denotes the component-wise minimum. All of these forms are usually referred to as coalescing duals, cf. [Lig05,SL95].
Note that this special case is actually equivalent to a Siegmund duality: We have x ∧ y = 0 if and only if x ≤ 1 − y componentwise, since x, y ∈ {0, 1} G . Therefore, for spin systems, (X t ) t and (Y t ) t are dual with respect to the function (6) if and only if (X t ) t and (1 − Y t ) t are dual with respect to (7). For obvious reasons, dualities with respect to this function are called moment dualities. There are many examples, such as the classical duality between the Wright-Fisher diffusion and the block-counting process of Kingman's coalescent, which can be easily checked by applying Propostion 1.2 the generators given by , n ∈ N (generator of the blockcounting process). The importance of this kind of duality stems of course from the fact that a probability measure on [0, 1] is uniquely determined by its moments (Hausdorff moment problem). The notion of moment duality clearly makes sense for any finite subset E of R. In Appendix B we provide an example of a moment duality with E = [−1, 1], compare also Definition 4.10. Moment duality can, at least formally, be defined for any E ⊆ R. However, in general this duality might not determine the one-dimensional distributions of the process completely. There are many connections between moment and coalescing duality. The coalescing duality of interacting particle systems can be cast in the form of a moment duality by writing, for In order to generalize this, we note that Let now and assume that both (X t ) and (Y t ) are exchangeable for all t ≥ 0, that is, L(X 1 t , ..., ) for all permutations π of {1, ..., N }. Then we see from (9) that the duality with respect to H(x, y) = 1 {x∧y=0} is equivalent to duality with respect tõ This generalizes moment dualities to the context of measure-valued processes, see for example [DK82,BLG03,DK96,DG14,EK95]. Moment dualities can be used to show uniqueness of solutions of Martingale problems, see [EK86], section 4, which has been applied for example in the context of SPDEs, [AT00].
or in the set-valued notation, H(A, B) = 1 {|A∩B| odd } . It is well-known that the voter model and annihilating random walk are dual with respect to this duality function [Gri79,SL95]. Related dualities are used for example in [AS12,JK12].
Example 1.5 (Laplace dual). Here, the duality function is of the form e − x,y , where ·, · denotes a scalar product, or more generally just a bilinear form. This duality is related to moment dualities, variants of it have been used in order to prove uniqueness of a solution of certain SPDE [Myt98].
2 Duality with respect to a measure In this section we elucidate the relationship between duality in the sense of Def. 1.1 and duality in the sense of Def. 1.3. The starting point is the following observation: if (X t ) is a Markov chain with discrete state space and a reversible probability measure µ, then (X t ) is self-dual with duality function H(x, y) := µ(x) −1 δ x,y , provided µ(x) > 0 for all x ∈ E. This is the "cheap" self-duality function of [GKRV09], where it serves as the starting point for more interesting dualities. Propositions 2.1 and 2.2 address the situation when the measure µ does not have full support or the state space is not discrete. We refer the reader to [DF90, Section 5], [CPY98, Section 5.1], and [KRS07] for further comparisons of dualities. We start with an example in discrete state spaces. Fix L ∈ N and consider a simple random walk on {0, 1, . . . , L + 1} with absorption at 0 and L + 1. Any reversible measure µ must be invariant, hence satisfies µ(x) = 0 for x = 1, . . . , L. On the other hand, a straight-forward computation shows that the diagonal function H(x, x) = 1 if x = 1, . . . , L, H(0, 0) = H(L + 1, L + 1) = 0, and H(x, y) = 0 if x = y, is a self-duality function. Clearly H(x, x) is not of the form 1/µ rev (x) for a reversible measure µ rev . Nevertheless, the measure µ(x) = 1/H(x, x) = 1 on {1, . . . , L} is a duality measure for a sub-Markov process, namely, the simple random walk killed at the boundary. Note that a duality measure for strictly sub-Markov kernels need not be invariant, but only excessive, (µP )(y) ≤ µ(y).
Remark. The previous example should be seen in the original context of the notion of duality with respect to a measure [Hu58]: the initial motivation came from potential theory, where Markov processes killed upon reaching the boundary of the domain play an important role.
The next proposition generalizes the relation explained in the previous example. The key subtlety concerns the invariance of subsets of the state space: if µ is a reversible measure of a Markov process (X t ), then (X t ) cannot leave supp µ -it can, however, jump from E\ supp µ into supp µ. The situation is the other way round if H(x, x) is a diagonal self-duality function of (X t ): in this case (X t ) may very well leave supp H, but it cannot leave E\ supp H, which is therefore a trap for (X t ).
Proposition 2.1. Let (X t ) and (Y t ) be Markov processes with identical discrete state space E.
1. Suppose that (X t ) and (Y t ) are dual with respect to a diagonal duality function H(x, x).
Then |H(x, x)| is a duality function too. Define the measure Then E \ supp µ is a trap for (X t ) and (Y t ), and the sub-Markov processeŝ are in duality with respect to the measure µ.
2. Conversely, let µ be a measure on E and let H(x, x) ≥ 0 be the diagonal function given by Eq. (12). Suppose that E \ supp µ is a trap for (X t ) and (Y t ), and (X t ) and (Ŷ t ) defined as in Eq. (13) are in duality with respect to µ. Then (X t ) and (Y t ) are dual with respect to H.
Proof. We start with the proof of 2. We need to check that for all x, y ∈ E and all t ≥ 0, The weak duality of (X t ) and (Ŷ t ) with respect to the measure µ tells us that Dividing by µ(x) and µ(y) on both sides, we find that Eq. (14) holds when x and y are both in supp µ. If x and y are both outside supp µ, Eq. (14) obviously holds since in this case H(x, x) = H(y, y) = 0. If x ∈ supp µ but y / ∈ supp µ, then H(y, y) = 0 and, because (Y t ) cannot leave E \ supp µ, Q t (y, x) = 0. Thus Eq. (14) holds. The symmetric case x / ∈ supp µ, y ∈ supp µ can be treated in a similar way.
Proof of 1.: We know that Eq. (14) holds for all x, y ∈ E. Therefore, if H(y, y) = 0 but H(x, x) = 0, we have Q t (y, x) = 0. Thus (Y t ) cannot leave E \ supp µ = {y | H(y, y) = 0} and (Ŷ t ) gives a well-defined Sub-Markovian process; the analogous statement for (X t ) follows in a similar way. Moreover, when x and y are both in supp µ, we obtain from Eq. (14) that which proves the claim when H is non-negative. Next, suppose that there are x and y such that H(x, x)H(y, y) < 0, i.e., H(x, x) and H(y, y) have opposite signs. Then Eq. (14) shows that for all t, P t (x, y) = 0 and Q t (y, x) = 0. Put differently, in Eq. (14) either both sides have the same sign or both sides vanish; as a consequence we can take absolute values and deduce that |H(x, x)| is a diagonal duality function.
Next let us look at non-discrete state spaces. In this case expressions such as 1/µ(x) do not make sense and it is not clear what a good diagonal function H would be. A way around this issue is to replace the candidate diagonal duality function by a family of functions H λ such that, formally, as λ → ∞, the family converges to the correct object -much in the same way as a Dirac measure can be approximated by a family of Gaussian functions with variances going to 0. To this aim recall the following: in finite state spaces, for strongly continuous Markov chains, the resolvent R λ satisfies λR λ → id as λ → ∞. Thus if we set H λ (x, y) = λR λ (x, y)/µ(y), then H λ → diag(1/µ(x)). For non-discrete state spaces, 1/µ(x) does not make sense as a function, but R λ (x, y)/µ(y) can be interpreted as a Radon-Nikodým derivative: if (X t ) and (X t ) are in duality with respect to µ, there is a function r λ (x, y) such that the resolvents satisfy R λ (x, dy) = r λ (x, y)µ(dy),R λ (x, dy) = r λ (y, x)µ(dy).
for all x, y ∈ E. Moreover we may assume that for all λ the functions x → r λ (x, y) and y → r λ (x, y) are λ-excessive for the resolvents (R α ) α>0 and (R α ) α>0 respectively, 1 and under this assumption the functions r λ (x, y) are uniquely determined by the resolvents [BG68][Chapter VI.1].
Proposition 2.2. Let µ be a σ-finite measure on E. Let (X t ) and (X t ) be Markov processes with càdlàg paths. Suppose that (X t ) and (X t ) are in duality with respect to µ and let r λ : E × E → [0, ∞) be the unique function that satisfies Eq. (16) and turns x → r λ (x, y) and y → r λ (x, y) into λ-excessive functions for (R α ) and (R α ) respectively. Then for all λ > 0, x, y ∈ E and t > 0, i.e., every r λ is a duality function for (X t ) and (Y t ).
The proof uses that two λ-excessive functions that are equal µ-almost everywhere are in fact equal everywhere [BG68] 2 . It is analogous to the proof of Theorem 1.16 in [BG68][Chapter VI.1], which shows that under additional regularity assumptions, that in Eq. (17) we may replace t by the hitting times T B andT B of Borel sets B ⊂ E.
Proof of Prop. 2.2. We note first that if the processes have càdlàg paths, then for all t > 0 and all non-negative, bounded functions f and g, Indeed, let L(t) and R(t) be the left and right-hand sides of this equation. The definition of duality with respect to µ says that L and R have the same Laplace transforms. As a consequence, L(t) = R(t) for Lebesgue-almost all t > 0. Now, since (X t ) has càdlàg paths, dominated convergence shows that for bounded continuous functions f , and every is càdlàg as well; similarly for (P t g)(x). Thus for nonnegative, bounded, continuous f and g, L(t) and R(t) are càdlàg, and equality Lebesgue-almost everywhere implies equality for all t > 0. Eq. (18) holds for continuous f and g and extends to bounded non-negative f and g by a density argument. Now let f, g ∈ L ∞ (E), non-negative. We integrate Eq. (17) against f (x)g(y)µ(dx)µ(dy). The left-hand side becomes, using r λ (x, y)µ(dy) = R λ (x, dy), Similarly, the right-hand side becomes, using r λ (x, y)µ(dx) =R λ (y, dx) and the duality of the semi-groups with respect to µ, Thus the left-hand side and the right-hand side of Eq. (17), integrated against f (x)g(y)µ(dx)µ(dy), are equal, for all non-negative f, g ∈ L ∞ (E). It follows that for all non-negative f and µ-almost Both sides are λ-excessive functions of x, therefore the functions agree for all x. It follows that Eq. (17) holds for all x and µ-almost all y; the identity extends to all y ∈ E by using again λ-excessivity.

Functional analytic theory
This section presents an analytic framework for duality of Markov processes. In Prop. 3.3, duality is recast as duality of operators with respect to a bilinear form; convex geometry enters in Section 3.3. As an application, we present abstract criteria for the existence and uniqueness of a dual to a given process with respect to a given duality function H. The main results are Propositions 3.6 and 3.9, and Theorems 3.13 and 3.20. We also give a criterion under which the dual of a Feller semi-group is a Feller semi-group (Theorem 3.14) and show that reversible processes with non-degenerate duality functions have the same spectrum (Theorem 3.24). All results of this section can be formulated in terms of the semi-groups only, and all Markov semi-groups will be on Polish spaces equipped with their Borel σ-algebra. By Markov semi-group we mean a family of kernels P t (x, A), t ≥ 0, such that for all x, P 0 (x, ·) = δ x , x → P t (x, A) is Borel-measurable for all measurable A, P t (x, ·) = 1 is a probability measure, and the Chapman-Kolmogorov equations hold. We do not assume any additional regularity such as existence of realizations with càdlàg paths. We write M(E) for bounded signed measures and M 1,+ (E) for probability measures on E.

Duality of semi-groups with respect to a bilinear form. Uniqueness
The natural setting for duality of Markov processes are dual pairs as encountered in the treatment of weak topologies and locally convex spaces [RS, Chapter V.7]. When B(·, ·) is non-degenerate, it is common to use a scalar product notation B(·, ·) = ·, · , and the triple (V, W, ·, · ) is referred to as a dual pair. Example 3.1 ("The" dual). Let X be a Banach space and X ′ the dual space, i.e., the space of continuous linear functionals from X to R. Then ϕ, x := ϕ(x) defines a non-degenerate bilinear form on X ′ × X. Every bounded operator S : W → W has a dual operator S ′ , often simply called "the" dual, sometimes also the Banach space adjoint.
Example 3.2 (Adjoint operator in a Hilbert space). Let H some real Hilbert space. The scalar product ·, · is a non-degenerate bilinear form on H × H, and every bounded operator A has a unique dual operator A * , the (Hilbert space) adjoint of A.
Then µ, f := E f dµ defines a non-degenerate bilinear form on M(E) × L ∞ (E). Let P (x, A) be a transition kernel, acting on functions via (P f )(x) = E P (x, dx ′ )f (x ′ ) and on measures via (P * µ)(A) = (µP )(A) = E µ(dx)P (x, A). P and P * are dual with respect to ·, · . Then (X t ) and (Y t ) are dual with respect to H if and only if for all t > 0, i.e., if and only if for all t > 0, P * t and Q * t are dual with respect to B H .
. If the processes are dual with respect to H, then Conversely, if the semi-groups are dual with respect to the bilinear form B, then for µ = δ x , Most of the standard duality functions have non-degenerate associated bilinear forms.  Prop 3.4 is proven in Appendix A. The proposition is of interest because non-degeneracy implies uniqueness of the dual. Another closely related condition for uniqueness is that the family of functions H(x, ·), x ∈ E separates M 1,+ (F ), see [EK86]; in [Sw06], the duality is called informative if both H(x, ·), x ∈ E and H(·, y), y ∈ F are separating (in M 1,+ (F ) and M 1,+ (E), respectively).
Definition 3.5. Let E be a Polish space and X ⊂ L ∞ (E). We call X separating if X separates M 1,+ (E), i.e., if two probability measures P, Q ∈ M 1,+ (E) satisfy E gdP = E gdQ for all g ∈ X , then P = Q.
We have the following implications.
Proposition 3.6. Let (P t ) be a Markov semi-group in E. Suppose that one of the following conditions holds: 1. The family of functions H(x, ·), x ∈ E, is separating.
Then the dual semi-group, if it exists, is unique.
Despite Proposition 3.4, not all duality functions are associated with non-degenerate bilinear forms. The following proposition describes a situation which is typical for pathwise duality, where processes often initially live on bigger spaces and we construct (X t ) and (Y t ) by forgetting a part of the information, see Section 4.
Proposition 3.7. Let (X t ) and (Y t ) be two Markov processes with Polish state spaces E and F , dual with respect to H : E × F → R. Suppose that (Y t ) can be lifted to a bigger space, i.e., there is a Polish space G, a measurable map π : G → F that is surjective but not injective, and a Markov process (Z t ) on G such that (π(Z t )) t≥0 is equivalent to (Y t ). LetH(x, z) := H(x, π(z)). Then (Z t ) is dual to (X t ) with duality functionH, and BH is degenerate.
Here "equivalence" means that whenever π(Z 0 ) and Y 0 have the same distribution, all finitedimensional distributions of π(Z t ) and (Y t ) agree.
Proof. Since π is not injective, there are z and z ′ such that z = z ′ and π(z) = π(z ′ ). It follows that δ z − δ z ′ is in the right null space of BH, and BH is degenerate. Moreover, for every z ∈ G and x ∈ E, we have hence (X t ) and (Z t ) are dual with duality functionH.
We will come back to this situation in Section 3.3, where we will characterize dualities that can be obtained as stochastic lifts of non-degenerate dualities.

Existence. Feller semi-groups
As mentioned in the previous section, any bounded operator T in a Banach space has a unique dual operator T ′ , and any bounded operator in a Hilbert space has a unique adjoint operator T * . For general dual pairs (V, W, ·, · ), however, it might be difficult to determine whether a given concrete operator has a dual.
The existence of a dual Markov process adds a layer of difficulty, as it is not enough to ask whether P * t has a B H -dual operator T t in M(F ): we also need to know whether the dual operator is of the form T t µ = µ(dx)Q t (x, ·) with Q t (x, A) the transition kernels of some Markov process. As it turns out, the existence of a dual operator semi-group is tied to the invariance of some linear subspace V; the existence of a dual Markov semi-group is tied to the stronger requirement that some convex subset V 1,+ ⊂ V be invariant under (P t ) [KRS07,Möh11].
Definition 3.8. Fix H : E × F → R bounded and measurable. We define The spaces V and W are defined in a similar way, replacing M 1,+ (E) and M 1,+ (F ) by M(E) and M(F ) respectively.
Thus V 1,+ ⊂ V ⊂ L ∞ (E) and W 1,+ ⊂ W ⊂ L ∞ (F ). Clearly V 1,+ and W 1,+ are convex, and V and W are linear subspaces that arise as the linear hulls of V 1,+ and W 1,+ . When E and F are finite state spaces, we may think of H as a matrix, and the four spaces correspond to linear and convex combinations of the rows and columns of the matrix.
We have the following necessary condition for the existence of a dual.
Proposition 3.9. Let E and F be Polish state spaces, H : E × F → R measurable and bounded, and (P t ) a Markov semi-group in E. Suppose that (P t ) has a Markov semi-group in F dual with respect to H. Then (P t ) leaves V 1,+ and V invariant, and any dual semi-group leaves W 1,+ and W invariant.
Proof. Let (Q t ) be a dual Markov semi-group and f (·) = E H(·, y)ν(dy) ∈ V 1,+ , ν ∈ M(F ). Then, for all t > 0 and x ∈ E, thus P t f ∈ V 1,+ and V 1,+ is invariant under (P t ). Since V is the linear hull of V 1,+ , the invariance of V follows from the invariance of V 1,+ . We can invert the roles played by E and F , (P t ) and (Q t ), and apply what we have just proven to (Q t ): this gives that any dual (Q t ) must leave W 1,+ and W invariant.
A list of spaces V 1,+ and V for some common duality functions is given in Tables 1 and 2. We note that invariance of V is related to regularity properties (e.g., does the semi-group map continuous functions to continuous functions?), while the invariance of V 1,+ is sometimes associated with monotonicity properties. We shall come back to the latter aspect in Section 5.
Remark. For finite state spaces and non-degenerate duality function, V is the image of R F under an invertible matrix, hence V = R E . As a consequence, V is automatically invariant. A trick around the invariance of V 1,+ is to define the dual Markov process in an artificially doubled state spaceF , where it always exists, see e.g. the paragraph after Eq. (2.3) in [HS79]. Table 1: List of convex subsets V 1,+ ⊂ L ∞ (E) associated with some common duality functions. For definitions of absolute and complete monotonicity, see Sections VII.2, VII.3 and XIII.4 in [Fe71].  Table 2: List of linear subspaces V ⊂ L ∞ (E) associated with some common duality functions; see [Ho62] for more on the Wiener algebra and Hardy spaces.
When both space and time are discrete, the invariance of V 1,+ is actually sufficient for the existence of a dual. Recall that in the discrete case duality is equivalent to ∀x ∈ E ∀y ∈ F : where P and Q are the transition matrices of Markov chains. The signed measures on F correspond to ℓ 1 (F ), and V becomes the image of ℓ 1 (F ) under the linear map with matrix H.
Proposition 3.10 (Discrete time and discrete state space). Let E and F be countable spaces, endowed with the discrete topology, and H :

Q can be chosen as a stochastic matrix if and only if
Proof. 1. The necessity of the invariance of V is proven as in Prop. 3.9. For the sufficiency, given y ∈ F , let h y ∈ V be the function h y (x) := H(x, y) and let (Q(y, z)) z∈F be a vector in ℓ 1 (F ) such that P h y = z∈F Q(y, z)h z . Such a vector exists because of the invariance of V under P , and one can check that the matrix Q defined in this way solves Eq. (20). 2. The necessity of the invariance of V 1,+ is proven as in Prop. 3.9. For the sufficiency, we proceed as in 1., noting that h y ∈ V 1,+ so that we can chose Q(y, ·) ∈ M 1,+ (F ) such that P h y = z∈F Q(y, z)h z .
For non-discrete state spaces and continuous time, the situation is more complicated, as we need to be able to choose the dual transition kernel Q t (x, dy) in such a way that the Chapman-Kolmogorov equations hold and that x → Q t (x, B) is measurable for every measurable B ⊂ F . The next theorem addresses non-degenerate dualities; in this case non-degeneracy ensures that there is a unique possible choice Q t (x, B) and the Chapman-Kolmogorov equations automatically hold. When F is not discrete, we impose an additional condition to ensure measurability.
Assumption 3.11. The duality function H is such that the associated linear space W from Definition 3.8 satisfies the following: Every measurable function g ∈ L ∞ (F ) is the pointwise limit of a uniformly bounded sequence of functions (g n ) from R1 + W.
Clearly, if H satisfies Assumption 3.11, then W is separating. The converse requires additional conditions. We recall the definition of continuous functions vanishing at infinity, useful in locally compact spaces: f ∈ C 0 (F ) if and only if f is continuous and for every ε > 0 there is Lemma 3.12. Suppose that W is separating on M(F ) and in addition one of the following conditions holds: 1. F is Polish and locally compact, and W ⊂ C 0 (F ).
Remember that for a locally compact space to be Polish as well, it is necessary and sufficient that its topology has a countable base; moreover every such space is σ-compact.
Proof of Lemma 3.12. We first prove 2. Suppose that W ′ is closed under multiplication. The functional monotone class theorem tells us that the closure of R1+W ′ under pointwise monotone limits of sequences is dense in Next we prove 1. First we note that W is dense in C 0 (F ) with respect to uniform convergence. Indeed, if this was not the case, the Hahn-Banach theorem would allow us to find a continuous functional ϕ : C 0 (F ) → R vanishing on the closure of W. By the Riesz-Markov theorem, there is a bounded signed measure µ such that ϕ(f ) = E f dµ for all f , and in particular E f dµ = 0 for all f ∈ W. But this contradicts that W separates M(F ). Thus W is dense in C 0 (F ). Since for Polish spaces C 0 (F ) generates the full Borel σ-algebra and C 0 (F ) is closed under multiplication, the proof is concluded with a monotone class theorem just as in the proof of item 1.

Remark. A naive reasoning suggests that if
it is tempting to think that there must be some probability measure P on (F, σ(W ′ )) that has two distinct extensions Q 1 , Q 2 to (F, B(F )). We would have gdQ 1 = gdQ 2 for every g ∈ W ′ but Q 1 = Q 2 , contradicting that W ′ is separating.
The following example, which we owe to M. Scheutzow, shows however that strict inclusion of σ-algebras in general does not imply the existence of a probability measure with non-unique extensions: Let F = [0, 1], B the Borel σ-algebra, and P the collection of subsets of [0, 1]. A Borel measure P is either purely discrete, in which case it has a unique extension to P, or has a continuous component, in which case it has no extension to P. Thus even though the inclusion B ⊂ P is strict, there is no probability measure on B with more than one extension to P; this is why in item 2. of Lemma 3.12 we explicitly require that σ(W ′ ) = B(F ).
The standard duality functions examined in Prop. 3.4 satisfy Assumption 3.11. Note that for the moment and Laplace duals, the families Theorem 3.13. Let E, F be Polish spaces, H : E × F → R bounded and measurable, and (X t ) a Markov process with state space E and semi-group (P t ). Suppose that either W is separating and F discrete, or Assumption 3.11 holds. Then (P t ) has a dual Markov semi-group with respect to H if and only if V 1,+ is invariant, and the dual with respect to H is unique.
See also [Möh11, Prop. 2.5]. Before we come to the proof of Theorem 3.13 and Lemma 3.12, we formulate a result on Feller semi-groups. Recall that (P t ) is a Feller semi-group on C 0 (E) if it maps C 0 (E) to itself and it is strongly Theorem 3.14. Assume that E and F are Polish and locally compact. Suppose that H ∈ C 0 (E × F ) and B H is non-degenerate. Then (P t ) has a Markov semi-group with respect to H if and only if V 1,+ is invariant, and the dual is unique. Furthermore, the dual is a Feller semi-group if and only if (P t ) is.
Proof of Theorem 3.13. The uniqueness follows from Prop. 3.6. For the existence, let y ∈ F and t ≥ 0; note f y (·) := H(·, y) The measure ν is unique because W is separating. We set Q t (y, B) := ν t,y (B). By construction, for every t and y, Q t (y, ·) is a probability measure, and we have, for all t, x, y, In order to show that (Q t ) is a Markov semi-group dual to (P t ), it remains to check that y → Q t (y, B) is measurable and that the Chapman-Kolmogorov equations hold. We start with the measurability. Let g(·) = E µ(dx)H(x, ·) ∈ W, µ a signed measure on E. Then Since, by Fubini, the right-hand side is a measurable function of y, we find that if g ∈ W, then Q t g is Borel-measurable. Let B ⊂ F be a Borel-measurable set. By Assumption 3.11, there is a sequence of functions (g n ) in R1 + W such that sup n∈N ||g n || ∞ < ∞ and g n → 1 B pointwise. By dominated convergence, we find that Q t (·, B) is the pointwise limit of the measurable functions Q t g n ; in particular, For the proof of Theorem 3.14 we need two lemmas.
Lemma 3.15. Let E and F be Polish and locally compact, and H ∈ C 0 (E × F ). Then V ⊂ C 0 (E). Moreover, for every ε > 0, there is a compact set K ⊂ E such that for all f ∈ V and x ∈ E \ K, |f (x)| ≤ ε.
Proof. If f (·) = F H(·, y)ν(dy) ∈ V, then The inversion of limits and integrals is justified by the dominated convergence theorem (recall that H is bounded and ν is a finite signed measure). Thus every function in V is continuous.
Moreover π E (K), as the image of a compact set under a continuous map, is itself compact. Thus f ∈ C 0 (E) and we have checked that V ⊂ C 0 (E). Since the compact set π E (K) depends on H alone and not on f , the uniformity statement of the lemma follows.
Our last lemma is a variant of the well-known fact that integral operators are typically compact operators [Wer00].
Lemma 3.16. Let E and F be Polish and locally compact, and H ∈ C 0 (E × F ). The following holds: 1. Let (ν n ) be a sequence in M 1,+ (F ) converging weakly to ν ∈ M 1,+ (F ). Then Here weak convergence of measures means, as usual, that for every bounded continuous function g, F gdν n → F gdν.
Remark. The following example shows that V 1,+ is in general not closed. Let E = F = N and H(m, n) = δ m,n n −1 . Then The closure of V 1,+ is the set of non-negative sequences (a n ) with ∞ n=1 na n ≤ 1.
Proof. Suppose that ν n → ν weakly. By the continuity of H and the definition of weak convergence, we see that F H(·, y)ν n (dy) → F H(·, y)ν(dy) pointwise, as n → ∞. In order to show that the convergence is uniform, we are going to mimick the proof of the compactness of integral operators with continuous integral kernel, using the Arzelà-Ascoli theorem as in [Wer00]. First we show that V 1,+ is relatively compact in C 0 (E). Let d be a metric on E generating the topology. H is uniformly continuous on every compact set K ⊂ E × F ; the uniform continuity extends to all of E × F because H vanishes at infinity. Fix ε > 0 and let δ > 0 such that for all y ∈ F and x, Thus V 1,+ is an equicontinuous family of functions. It is also uniformly bounded, by sup E×F |H|. If E and F are compact, we can conclude right away that V 1,+ is relatively compact.
For the general case, by Lemma 3.15, for every j ∈ N there is a compact set K j ⊂ E such that for all f ∈ V 1,+ , sup E\K j |f | ≤ 1/j. In particular, every function in V 1,+ vanishes outside ∪ j∈N K j . We may assume without loss of generality that K j ⊂ K j+1 for all j. Let g n (·) = F H(·, y)ν n (dy) be a sequence in V 1,+ . There is a function g ∈ C(K 1 ) and a subsequence g φ 1 (n) , φ 1 : N → N strictly increasing, converging to g uniformly on K 1 . Iterating the procedure, we find that there are strictly increasing maps (φ j ) and a continuous function g ∈ C(∪ j∈N K j ) such that for every j ∈ N, An ε/3-argument shows that g n k converges to g, uniformly in all of E, which also implies that g ∈ C 0 (E) (we had not yet checked that g is continuous in all of E). This proves that V 1,+ is relatively compact.
Next, suppose that (ν n ) converges weakly to some probability measure ν on F . Let g(·) := F H(·, y)ν(dy); we know that g n → g pointwise. If the convergence is not uniform, because of the relative compactness of V 1,+ , we can find a subsequence (g n k ) and a function h ∈ V 1,+ with h = g such that g n → h uniformly. Since uniform convergence implies pointwise convergence, it follows that g = h, contradiction. Thus g n → g uniformly.
It remains to check that V 1,+ is closed when F is compact. Let (g n ) = ( F H(·, y)ν n (dy)) be a sequence in V 1,+ converging uniformly to some function g ∈ C 0 (E). If F is compact, we conclude from Prohorov's theorem that there is a subsequence (ν n j ) converging weakly to some probability measure ν on F , and g = F H(·, y)ν(dy) ∈ V 1,+ .
Proof of Theorem 3.14. The existence of a unique dual (Q t ) follows from Theorem 3.9. Suppose that (P t ) has a dual Markov semi-group (Q t ) that is strongly continuous in C 0 (E). By Prop. 3.9, the set V is invariant under (P t ). Since P t 1 = 1 and ||P t f || ∞ ≤ ||f || ∞ , the closure of R1 + V is invariant as well. By Lemma 3.15, this closure is all of C 0 (E), thus C 0 (E) is invariant under (P t ).
Next, let ν ∈ M 1,+ (F ). For every g ∈ C 0 (E), we have as t → 0. Therefore Q * t ν → ν vaguely and, since ν is a probability measure, Q * t ν → ν weakly. Let f (·) := F H(·, y)ν(dy). We deduce from Lemma 3.16 that as t → 0, Inverting the roles of E and F , (P t ) and (Q t ), we see that the converse follows from the same arguments.

Cone duality and lifts of non-degenerate dualities
Here we discuss the notion of cone duality. It was introduced in [KRS07] and further discussed in [Möh11]. Both references were primarily interested in non-degenerate dualities. This section's main result, Theorem 3.20 shows that the notion develops particular clout for the understanding of a certain type of degenerate dualities. The natural framework for cone duality is convex geometry and Choquet theory [Ph01]. Recall that if C is a convex set in some vector space, a point x ∈ C is extremal if it cannot be written as a convex combination of any two distinct points from C. The set of extremal points of C is denoted ex C. A simplex in R n is a non-empty convex, compact set such that every point in C can be written in a unique way as a convex combination of the extremal points of C -a filled closed triangle is a simplex, but a square is not. We are interested in the following generalization of a simplex: • Every function f ∈ C can be represented as for a unique probability measure µ on (ex C, F).
• C contains all functions of the form (21).
Definition 3.18 (Cone duality). Let E be a Polish space and C ⊂ L ∞ (E) a convex set with a unique integral representation on (ex C, F). Let (P t ) be a Markov semi-group on E and (Q t ) a semi-group on (ex C, F). Then (Q t ) is the cone dual of (P t ) if it is dual with respect to the duality function H C (x, e) := e(x).
Thus Q t (e, ·) is the measure on ex C such that (P t e)(x) = Q t (e, de ′ )e ′ (x). The cone dual, if it exists, is unique. A necessary condition for the existence of the cone dual is the invariance of C under (P t ). If this condition is satisfied, the only potential source of problems is the measurability of e → Q t (e, B).
Remark. The name "cone duality" can be motivated in different ways. Recall that a cone, in the usual sense of convex geometry, is a set C such that if x ∈ C, then the whole ray of points tx, t ≥ 0 is in C. Formally, this resembles the definition of invariance of a subset, f ∈ C ⇒ ∀t ≥ 0 : P t f ∈ C; Möhle [Möh11] therefore calls an invariant set a cone for the semi-group. Another motivation [KRS07, Section 1.5] is to look at the set When H : E × F → R is a duality function such that W is separating, the cone dual and the usual dual can be identified via the bijection F → ex V 1,+ , y → H(·, y), as done in [KRS07] for the Siegmund dual and the associated cone dual. The situation becomes more interesting when V 1,+ has a unique integral representation even though W is not separating. In this case the notions differ: the cone dual and the usual dual have different state spaces, and the cone dual is unique, while the usual dual in general is not.
We will restrict to H ∈ C 0 (E ×F ) and compact F , so that by Lemma 3.16, V 1,+ is a compact convex subset of C 0 (E) (in the uniform topology). Let F be the Borel σ-algebra on ex V 1,+ corresponding to the uniform topology. V 1,+ has a unique integral representation with respect to F if and only if it is a Choquet simplex [Ph01].
Lemma 3.19. Let E be Polish and locally compact, F a compact separable Hausdorff space, and H ∈ C 0 (E × F ). Suppose in addition that V 1,+ a Choquet simplex. For y ∈ F , let Π(y, ·) be the unique probability measure on ex V 1,+ such that for all x ∈ E, , and H given by the matrix Then V 1,+ is a simplex with two extremal points, the column vectors (2, 0) T and (0, 2) T . The transition kernel Π can be identified with the matrix Theorem 3.20. Let E be a Polish and locally compact space, F a compact separable Hausdorff space, H ∈ C 0 (E × F ), and (P t ) a Markov semi-group in E. Suppose that V 1,+ is a Choquet simplex invariant under (P t ). Then 1. The cone dual exists and is unique. It is defined on ex V 1,+ with the uniform topology and the Borel σ-algebra; ex V 1,+ is Polish.
2. (P t ) has at least one dual Markov semi-group (Q t ).
3. Let R t (x, de) be the semi-group of the cone dual and for all y ∈ F and all measurable B ⊂ ex V 1,+ .
Eq. (24) essentially says that any dual Markov process (Y t ) and the cone dual (Z t ) can be defined on a common probability space (Ω, P) in such a way that for all t > 0, provided the relation holds at t = 0. An interesting special case is when every H(·, y) is extremal, so that the map π(y) := H(·, y) is a surjection from F onto ex V 1,+ . In this case Π(y, B) = δ π(y) (B) and Eq. (25) becomes As a consequence, we can think of (Y t ) as a lift of the cone dual from ex V 1,+ to the "bigger" space F . This is a kind of converse to Prop. 3.7.
Remark. Relations of the type (24) are often called intertwining relations. For references and applications of intertwining in the context of duality, see [DM09,Sw11,HM11] and Section 6.1. An example of an intertwining relation with invertible kernel (note that our Π is not invertible) is the concept of thinning in interacting particle systems, which has been related to duality theory as well. Here the kernel represents transition probabilities for a particle to be thrown away.
In general, the duals constructed for the proof of Theorem 3.20 have rather bad continuity properties. There will be a subset F 0 ⊂ F such that Q t (y, F 0 ) = 1 for all y ∈ F and all t > 0; in particular, if F 0 is not dense in F , the dual process (Y t ) constructed in 2. will leave any neighborhood of y ∈ F \ F 0 immediately: the constructed process has branch points.
If we have more information than in Theorem 3.20, we can slow down the jumps from F \ F 0 to F 0 and obtain a dual with nice continuity properties. We prove a statement for finite state spaces only.
Theorem 3.21. Let E and F be finite state spaces and H ∈ R E×F . Let (P t ) be a strongly continuous Markov semi-group in E. Suppose that V 1,+ is a simplex and invariant under (P t ). Then (P t ) admits a strongly continuous dual (Q t ).
Now we come to proofs. The proof of Theorem 3.20 requires two lemmas. Later we are going to apply this lemma to X = M(F ) with the topology of weak convergence of measures, Y = C 0 (E) with the topology from the supremum norm, and T : ν → F H(·, y)ν(dy).
Proof. Let y ∈ ex T (C). Let F y := {x ∈ X | T x = y}. Then F y is a face in C, i.e., if x ∈ F y is a convex combination of two points in C, then these two points must be in F y . Indeed, if x ∈ M f is of the form x = (1 − t)x 1 + tx 2 with t ∈ (0, 1) and x 1 , x 2 ∈ C, then y = (1 − t)T x 1 + tT x 2 . Since y is extremal in C, it follows that y = T x 1 = T x 2 , thus x 1 and x 2 are in F y .
F y is closed (because T is continuous) and, as a closed subset of the compact set C, itself compact. Therefore ex F y = ∅ (this is a part of the Krein-Milman theorem [Wer00, Theorem VIII.4.4]). Moreover, because F y is a face, ex F y = (ex K) ∩ F y [Wer00, Lemma VIII.4.2]. Thus we can pick x ∈ ex F y and find that y = T x and x ∈ ex C. This proves ex T (C) ⊂ T (ex C).
• π is surjective and continuous.
The right inverse ι will be used to prove the existence of a Π-lift as in Eq. (24), see Eq. (26) below.
Proof. The surjectivity is a consequence of Lemma 3.22, applied to X = M(F ), Y = C 0 (E) and T : ν → F H(·, y)ν(dy), where M(F ) is endowed with the topology of weak convergence of measures and C 0 (E) the (supremum) norm topology. By Lemma 3.15, V 1,+ is a compact subset of C 0 (E). By Lemma 3.16, T maps weakly convergent sequences to convergent sequences; since the topology of weak convergence on probability measures in Polish spaces is metrizable, it follows that T is continuous. Thus Lemma 3.22 can be applied to the convex, compact set V 1,+ , and we find that every extremal f ∈ V 1,+ is of the form f = T ν for some ν ∈ ex M 1,+ (F ). Now, ν is extremal in M 1,+ (F ) if and only if it is a Dirac measure ν = δ y . Thus f (·) = H(·, y) for some y ∈F , and π is surjective. The continuity of π follows from Lemma 3.16.
The set ex V 1,+ is a G δ subset (countable intersection of open sets) of the compact metric space V 1,+ [BR87, Theorem 4.1.11] hence in particular, measurable. Thus F 0 = T −1 (F ), as the preimage of the measurable set under a continuous map, is measurable.
Proof of Lemma 3.19. We only need to prove the measurability of y → Π(y, B). For f ∈ V 1,+ , letΠ(f, ·) be the unique probability measure on ex V 1,+ such that f (x) = Π (f, de)e(x), for every x ∈ E. Let S(V 1,+ ) be the set of convex, continuous functions on V 1,+ and for φ ∈ C(V 1,+ ), define the upper envelope bỹ [BR87, Corollary 4.1.18]. We deduce (Πφ)(f ) =φ(f ). In particular,Πφ is upper semicontinuous and therefore measurable. Since differences of continuous convex functions are dense in C(V 1,+ ) [BR87, Lemma 4.1.14], and every measurable function is a pointwise limit of continuous functions, it follows thatΠ maps Borel measurable functions to Borel measurable functions. In particular, for every measurable B ⊂ V 1,+ , f →Π(f, B) is measurable.
Since Π(y, B) =Π (H(·, y), B), we find that y → Π(y, B) is the composition of a measurable map and a continuous map, and therefore measurable, for every fixed measurable B.
Proof of 1. in Theorem 3.20. Endow ex V 1,+ with the uniform topology and the Borel σ-algebra F as described above. By assumption, V 1,+ is a Choquet simplex and therefore has a unique integral representation over (ex V 1,+ , F). Note that if e n → e uniformly and x n → x, then e n (x n ) → e(x). It follows that the function H C (x, e) = e(x) and the evaluation maps e → e(x) are continuous, hence measurable. Furthermore, ex V 1,+ is a G δ subset of the compact separable Hausdorff space V 1,+ [BR87, Theorem 4.1.11], hence Polish.
The uniqueness of the cone dual follows from the uniqueness in the integral representation, see also Prop. 3.6. For z ∈ ex V 1,+ and t > 0, let R t (e, ·) be the unique probability measure on ex V 1,+ such that (P t e)(x) = ẽ(x)R t (e, dẽ).
In order to check the measurability, letΠ(f, de) be as in the proof of Lemma 3.19. Fix B ⊂ ex V 1,+ measurable and t > 0. Note R t (e, B) =Π(P t e, B). Thus e → R t (e, B) is the composition of the continuous map e → P t e with the measurable map f →Π(f, B). It follows that it is measurable.
Proof of 3. in Theorem 3.20. Suppose that (Q t ) is dual to (P t ). Then, for every x ∈ E and y ∈ F , we have Eq. (24) follows from the uniqueness of the integral representation. Conversely, if Eq. (24) holds, computations in the same spirit show that (Q t ) is dual to (P t ) with respect to H.
Now we turn to the proof of Theorem 3.21. Recall that a Q-matrix is a matrix with row sums 0 and non-negative off-diagonal entries.
Proof of Theorem 3.21. Assume (P t ) is strongly continuous with generator L, i.e., P t = exp(tL) for all t > 0. Theorem 3.14 shows that the cone dual is strongly continuous. Let F 1 ⊂ F be a set such that the columns H(·, y 1 ), y 1 ∈ F 1 , are extremal in V 1,+ , and every column H(·, y), y ∈ F , is a unique convex combination of columns indexed by y 1 ∈ F 1 . In the notation of Lemma 3.23, From here on we identify F 1 with ex V 1,+ and consider the cone dual (R t ) as a process with state space F 1 . The transition kernel Π from Lemma 3.19 becomes a |F | × |F 1 | matrix. It has a block structure with id a |F 1 | × |F 1 | identity matrix, and Π 21 a |F 2 | × |F 1 | matrix, where H 1 and H 2 are |E| × |F 1 | resp. |E| × |F 2 | matrices, and ker H 1 ∩ {1} ⊥ = {0}. We can define a |F | × |F | stochastic matrixΠ by the block structurê Π = id 0 Π 21 0 .
Note H = HΠ. The dual (Q t ) constructed for the proof of Theorem 3.20 is of the form It describes a process that jumps from F 2 to F 1 right away; we want to slow down this jump in order to obtain a strongly continuous process. Thus we are looking for a Q-matrixL such that LH = HL T . Let (Q t ) be a dual as in Eq. (29). Since Q t H T = H T P t , we find that Q t , restricted to W := ran H T ⊂ R F , is strongly continuous. Moreover, by Prop. 3.9, (Q t ) leaves W invariant. As a consequence, there is a linear map B from R1 + W to itself such that Q t g = exp(tB)g for all t ≥ 0 and g ∈ R1 + W, and we have BH T = H T L T . Next, we note thatΠ is a projection (Π 2 =Π) with ranΠ = R1 + W. Indeed, the inclusion R{1} + W ⊂ ranΠ follows fromΠ1 = 1 andΠH T = H T . The reverse inclusion is equivalent to {1} ⊥ ∩ ker H ⊂ kerΠ T (remember that in finite dimensions, (V + W ) ⊥ = V ⊥ ∩ W ⊥ and ran A T = (ker A) ⊥ )). Indeed, if v ∈ {1} ⊥ ∩ ker H, there are two probability measures µ, ν on F (considered as column vectors) such that v = µ − ν and Hµ = Hν; using the definition of Π and that every column of H is a unique convex combination of columns H(·, y), y ∈ F 1 , we deduce thatΠ T µ =Π T ν.
As a consequence, BΠ is a well-defined linear map in R F and we can define, for λ > 0 to be specified later,L := BΠ + λ(Π − id),Q t := exp(tL).
We haveL1 = 0, andL andQ t have F 1 × F 2 block structures of the form Here and below the star * is a generic abbreviation for a matrix block not necessarily equal to zero, to which we do not assign names because the precise values do not matter; different star blocks need not be equal. By construction, It remains to check thatL is a Q-matrix. We already know thatL1 = 0; choosing λ large enough, we can ensure that the lower left block ofL has only non-negative entries. Thus we are left withL 11 . For f ∈ R F ,Q On the other hand, since ranΠ = R1 + W, we haveQ tΠ f = Q tΠ f . Therefore if f 1 ≥ 0 componentwise, then also exp(tL 11 )f 1 ≥ 0 componentwise, for all t ≥ 0. As a consequence,L 11 has non-negative off-diagonal entries.

Spectrum and unitary equivalence. Fourier transforms
Let (P t ) and (Q t ) be Markov semi-groups in finite state spaces. Suppose that they are dual with respect to some non-degenerate duality function H. Then H defines an invertible E × F -matrix and we can write P t = HQ T t H −1 , which implies in particular that P t and Q t have the same eigenvalues and eigenvalue multiplicities. As noted in the paragraphs preceding Theorem 3.2 in [Möh99], this observation has been used to compute eigenvalues for some models of population genetics; 3 Besides this computational application, the observation is interesting because of the relation between the spectral gap and mixing properties of the Markov chain. Moreover, some chains allow for a stochastic interpretation of all eigenvalues: for example, [DF90,Remark 4.22] relates the eigenvalues to parameters of geometric random variables characterizing how fast a chain becomes stationary.
Theorem 3.24 below states that an analogous relation holds for infinite state spaces if both Markov processes are known to be reversible; this applies to many models studied in the context of hydrodynamic limits [DMP91]. Without the assumption of reversibility, non-degenerate duals can have drastically different spectrum: in the following example, taken from [GKRV09, Sec. 3.5] one infinitesimal generator has discrete, real spectrum, while the other has the complex half-plane ℜλ ≤ 0 as its spectrum.
(Q t ) describes a process of independent random walkers on two sites. As shown in [GKRV09, Sec. 3.5], (P t ) and (Q t ) are dual with duality function H(x, n) = x n 1 1 x n 2 2 . The associated bilinear form is non-degenerate.
In order to determine the spectrum ofL, we note thatL leaves the finite state spaces E n := {(n 1 , n 2 ) ∈ E | n 1 + n 2 = n}, n ∈ N 0 invariant; on each of these,L has the uniform distribution as a reversible measure. ThusL is block diagonal with finite-dimensional symmetric matrices as blocks; as a consequence, it has real, discrete spectrum -in fact, using the duality betweenL and an Ornstein-Uhlenbeck process [GKRV09, Remark 3.1], one can show that σ(L) = {−n/2 | n ∈ N 0 } On the other hand, let λ ∈ C with ℜλ < 0 and define the complex-valued function f λ by f λ (x 1 , x 2 ) := (x 2 − x 1 ) −λ = exp(−λ log(x 2 − x 1 )) for x 2 > x 1 and f λ (x 1 , x 2 ) = 0 for x 2 = x 1 is continuous on E. We have Let L be the infinitesimal generator of (P t ) in C(E; C), the space of complex-valued continuous functions in E. We have just shown that λ ∈ σ(L), for any λ with ℜλ < 0. The theory of contraction semi-groups tells us that σ(L) is a closed set contained in the complex half-plane ℜz ≤ 0. Thus σ(L) = {λ ∈ C | ℜλ ≤ 0}. Now let us turn to the reversible case. Recall that if µ is a reversible measure for the semigroup (P t ), then P t is self-adjoint as an operator in L 2 (E, µ), the Hilbert space of complexvalued square-integrable functions with scalar product f, g = E f gdµ. A bounded operator U : H 1 → H 2 between Hilbert spaces is unitary if U * U = id H 1 and U U * = id H 2 ; equivalently, if it is bijective and norm preserving, ||U x|| = ||x||.
Theorem 3.24. Let (P t ) and (Q t ) be dual with respect to H. Suppose that that (P t ) and (Q t ) have reversible probability measures µ and ν, and that the bilinear form associated with H is non-degenerate. Then there is a unitary operator U : L 2 (F, ν) → L 2 (E, µ) such that for all t > 0, Q t = U * P t U .
Thus (P t ) and (Q t ), as operators in L 2 (E, µ) and L 2 (F, ν), are unitarily equivalent. An immediate consequence of Theorem 3.24 is the following corollary.
Before we turn to the proof of Theorem 3.24, let us point out that another interesting connection between duality and unitary transformation appears, without reversibility, for stochastic processes on locally compact abelian groups; the duality function is chosen as the kernel of the Fourier transform -see [SC01] for an introduction to random walks and diffusions on groups and [HR70, Chapter 8] for theoretical background on harmonic analysis. This type of duality was considered by Holley and Stroock [HS76,HS79], who noticed that this setup includes dualities between diffusions and random walks as well as dualities of interacting particle systems.
Example 3.6 (Fourier duality for diffusions on a circle and random walk [HS79]). Let E = {exp(iθ) | θ ∈ R} = U (1), F = Z, and consider the diffusion in E with formal generator L = (1 − cos θ)∂ 2 /∂θ 2 , and the random walk on Z with generator (Lf )(n) = n 2 (f (n + 1) + f (n − 1) − 2f (n)). These two processes are dual with respect to H(θ, n) = exp(inθ). Proof of Theorem 3.24. Let T : L 2 (F, ν) → L 2 (E, µ) be the integral operator (T g)(x) := F H(x, y)g(y)ν(dy). Write || · || p for the L p -norms and ·, · for the scalar product in L 2 (E, µ) and L 2 (F, ν). Using Cauchy-Schwarz, we have thus T is a bounded operator. T is defined so that f, T g = B H (f µ, gν). The non-degeneracy of B H shows that both T and T * are injective and have dense ranges, e.g., ran T Because of the reversibility of µ and ν, we have P * t (f µ) = (P t f )µ and Q * t (gν) = (Q t g)ν; moreover, P t and Q t are self-adjoint in the respective L 2 -spaces. Therefore , gν) = f, T Q t g for all f ∈ L 2 (E, µ) and g ∈ L 2 (F, ν). It follows that P t T = T Q t , i.e., T intertwines P t and Q t .
If T is unitary, we are done. If T is not unitary, we construct a unitary by adapting the constructions from the polar decomposition [RS, Section VI.4]. Let A := √ T * T be the positive semi-definite operator in L 2 (F, ν) with A 2 = T * T . We note that ||Af || 2 2 = g, T * T g = ||T g|| 2 2 , thus A is injective because T is. Moreover, ran A is dense: this follows from ran T * T = ran A 2 ⊂ ran A and g ∈ ran(T * T ) ⊥ ⇔ ∀h ∈ L 2 (F, ν) : T g, T h = 0 ⇔ T g ∈ (ran T ) ⊥ ⇔ T g = 0 (because ran T is dense) ⇔ g = 0 (because T is injective).
By adapting the proof of Theorem VI.10 in [RS], we find that there is a unique unitary operator U : L 2 (F, ν) → L 2 (E, µ) such that T = U A.
Next, we want to deduce from P t T = T Q t the intertwining relation P t U = U Q t , using that P t and Q t are self-adjoint. Write P t = λdE λ and Q t = λdÊ λ for the spectral decompositions of P t and Q t [RS, Section VII]; we suppress the t-dependence in the notation.
A simple induction over n shows that P n t T = T Q n t for all n ∈ N, which in turn implies that the spectral projections are intertwined, E λ T = TÊ λ . It follows that T (kerÊ λ ) ⊂ ker E λ and T (ran E λ ) ⊂ ran E λ . Applying the polar decomposition construction to the restrictions, we obtain unitaries U 1 : kerÊ λ → ker E λ and U 2 : ranÊ λ → ran E λ . The uniqueness statement in the polar decomposition can be used to show that U 1 and U 2 must coincide with the restrictions of U to the corresponding subspaces, and it follows that E λ U = U E λ . Since this holds for every λ ∈ R, we deduce P t U = U Q t .

Pathwise duality
In this section, we discuss various notions of pathwise duality, which strengthen the basic notion of dual processes. Often one is interested to know whether a duality holds in some pathwise sense, which has to be specified. We will introduce strong pathwise duality as well as weaker notions. As a first step, we show that in principle dual processes can always be coupled. In the next proposition, given two probability measures µ and ν on E and F , P µ and P ν refer to the laws of (X s ) and (Y s ) with initial conditions µ and ν.
Proposition 4.1. Let (X t ) and (Y t ) be two Markov processes with respective Polish state spaces E and F , and H : E × F → R measurable and bounded. Then (X t ) and (Y t ) are dual with respect to H if and only if for all t > 0 and every choice of initial conditions µ of (X s ) and ν of (Y s ) there are processes (X s ) s∈[0,t] and (Ỹ s ) s∈[0,t] , defined on a common probability space (Ω, F, P) such that The probability space (Ω, F, P) may depend on t, µ, and ν.

This proves (ii).
For applications and to infer properties of one process to its dual, simple duality and the trivial coupling of (X t ) and (Y t ) on the product space is often not enough. We would like to find a stronger, pathwise coupling of the two processes. In general, two processes which are dual with respect to a duality function H are called pathwise dual if they can be coupled using an (explicitly constructed) auxiliary driving stochastic process, so that, for given initial conditions, one proces is running forward, the other running backward in time driven by the same realization of the auxiliary process. This concept has been widely used in many concrete cases, but it seems to us that there is no general treatment or even a generally accepted precise definition of this somewhat vague notion in the literature so far. It should be mentioned that in the context of duality with respect to a measure (recall definition 1.3), there is a powerful notion of pathwise duality via Kuznetsov realization of Markov processes, (cf. [Ge10], Thm. 2.4), which will not be discussed here.
In this chapter, we define some notions of pathwise duality with respect to a function, and discuss a few general concepts for the construction of pathwise duals, in particular the graphical representation for dual interacting particle systems. However, there are few general results, and in many questions we will restrict ourselves to some concrete examples, mostly from interacting particle systems.

Strong pathwise duality
To start with, we propose the following definition of strong pathwise duality; the reader should contrast it with Proposition 4.1. In this context, some statements can be formulated more easily in terms Markov families instead of Markov processes. Recall that every Markov family defines a Markov process, via the canonical construction [Dyn65]. (ii) For all s ∈ [0, t] and all x ∈ E, y ∈ F , Then (X t ) and (Y t ) are called strongly pathwise dual with respect to H.
In most examples, (Ω, F, P) can be chosen independent of t, but the precise form of the maps X x s , Y x s will depend on the fixed time horizon t. Moreover, in the examples below X x s and Y y t−s are independent, and there is a collection of independent random variables (Z s ) s∈[0,t] -for example, Poisson arrows in graphical representations -such that X x s is measurable with respect to σ(Z α , α ∈ [0, s]) and Y y t−s is measurable with respect to σ(Z α , α ∈ [s, t]).
Lemma 4.3. Let (X t ) and (Y t ) be strong pathwise duals with respect to H. Then they are dual.
Proof. For every x ∈ E, y ∈ F, by strong pathwise duality there exist (X x s ), (Y y s ) such that As opposed to the coupling in Proposition 4.1, which worked for each fixed set of initial conditions, we now ask for a coupling that works for all initial conditions at once. The important task is to explicitly describe (Ω, F, P) in a useful way. There are many situations where this can be achieved in terms of a graphical representation.
Example 4.1 (Absorbed and reflected random walks). Let (X n ) be the Markov process associated with simple symmetric random walk on N 0 absorbed upon first hitting 0. Let (Y n ) be discrete time simple symmetric random walk reflected at 0 via the transition rule P (0, 1) = P (0, 0) = 1/2. As for Brownian motion, it is a classical result that (X n ) and (Y n ) are Siegmund duals (compare also Section 5). We give a simple pathwise construction of this duality. Let W n , n ∈ N 0 , be iid random variables on a space (Ω, F, P) with P(W 1 = −1) = P(W 1 = 1) = 1 2 . We think of W n as a sequence of arrows pointing upwards if W n = 1 and downwards if W n = −1 (cf. Figure 1). Fixing N ∈ N, x, y ∈ N and a realization (W n ) 0≤n≤N −1 , we define X x 0 := x, and In words, (X x n ) is constructed by following the arrows up and down until the first hitting time of 0, after which it stays in 0. (Y y n ) is constructed using the same realization of (W n ), but going backward in time, and following the arrows in the converse direction, unless Y y n = 0, in which case the process either stays in 0 or is reflected. That is, we set Y y 0 := y, and By construction, (X x n ) and (Y y n ) have the same finite dimensional distribution as (X n ) and (Y n ) under P x , P y , respectively. Since we have used the same arrows in the construction of both proceses, the paths of (Ŷ y n ) n defined byŶ y n := Y N −n and (X x n ) don't intersect, and therefore x ≤ Y y N if and only if X y N ≤ y, hence Proposition 4.4 (Siegmund duals are pathwise dual). Let (X t ) and (Y t ) be right continuous Markov processes in discrete or continuous time on a totally ordered state space, which are dual with respect to H(x, y) = 1 {x≤y} . Then they are strongly pathwise dual.
The proof of this result is given in Clifford and Sudbury [CS85], where a general construction in the spirit of the above example is given. Clearly strong pathwise duality implies duality, but not every duality is strongly pathwise. We now give an example of a duality which is not strongly pathwise, but later we will see that also in this case there is an underlying pathwise construction.  N), see e.g. [Eth11]. It is well-known that (X t ) and (N t ) are dual with respect to H(x, n) = x n . However, the duality is not strongly pathwise. To see why, let x ∈ (0, 1), n ∈ N and t > 0. Suppose that (X s (ω)) s≥0 started in x and (N s (ω)) s≥0 started in n are defined on a common probability space (Ω, P). Let We know that N t (ω) ≤ n almost surely, since the number of blocks for coalescent decreases.
The last inequality is strict because the paths of the Wright-Fisher diffusion are not monotone functions, unless started in the absorbing states 0 and 1. Eq. (31) shows that the moment duality of (X t ) and (N t ) is not strongly pathwise. However, there is an underlying pathwise structure in a sense that will become more evident in Sections 4.2 and 4.3. The construction given in the example of absorbed and reflected random walks is probably the easiest example of a general construction of the underlying auxiliary processes for (strong) pathwise dualities, which, depending on the field, are also called graphical representation or driving sequence. We indicate the general idea to construct such a representation. Example 4.3 (Pathwise duality, discrete time). Let (X n ) n∈N , (Y n ) n∈N be Markov processes in discrete time with state space E and F respectively. Assume that there exists a sequence of iid random variables (W n ) n∈N on Ω, and transition functions f : that is, given the values of W n and X n−1 , the value of X n is uniquely determined (and not random any more). See for example [AS96]. Assume further that then (X n ) and (Y n ) are pathwise dual: Fix a realization of (W n ). Use (W n ) n=,0...,N as a driving sequence for (X n ) n=0,...,N and (W N −n ) n=0,...,N for (Ŷ n ) n=0,...,N . ClearlyŶ is equal in distribution to Y, and by (32), Iterating this proves strong pathwise duality. The transition probabilites of (X n ) are given by P (X n = y | X n−1 = x) = P(f (x, W n ) = y) = P(f (x, W 1 ) = y).
Hence the Markov chains are necessarily time-homogeneous.
Example 4.4 (Continuous time, discrete space). Clearly this example in discrete time can be extended to continuous time and discrete state space, by considering independent Poisson processes with rates λ i , 1 ≤ i ≤ k for some k ≥ 1, adapted to the jump rates of the processes, and transition functions f i : E → E, g i : F → F. Assume that (X t ), (Y t ) change their state at a jump time τ of the i−th Poisson process according to the rule This is certainly well-defined if |E|, |F | < ∞. For countably infinite state space, the construction works if the interactions are sufficiently local [Har78]. If H(f i (x), y) = H(x, g i (y)) for all x ∈ E, y ∈ F, then X and Y are (strongly pathwise) dual with duality function H.
In the case of spin systems (E = {0, 1} G ) and coalescing or annihilating duals, this kind of construction goes back to Harris [Har78] and is of widespread use. It is usually referred to as graphical representation, since it can be represented by drawing a line of length t for every element of G, and representing the Poisson processes by arrows between pairs of lines. A detailed account can be found in Griffeath [Gri79] and Liggett [Lig05], see also [S00]. Usually, the interpretation of the mechanisms f i , g i is such that one thinks of a particle at the tail of an arrow in the graphical representation having some effect on the configuration at the tip, for example by jumping there, or by branching, and subsequent coalescence, or annihilation, or death. The rates of the Poisson processes then naturally have the interpretation of giving a rate per particle for some event to happen. We now give a variant of this construction, which differs only slightly from the previous one in the sense that the Poisson process gives us the rate of an event happening per pair of sites on a graph, which may or may not be occupied by a particle. We discuss this construction here in some detail, in the case of a complete graph. for q ∈ R \ {1}. We call such a duality a q−duality. Special cases are q = 0, which leads to a coalescing duality, and q = −1, which is equivalent to an annihilating duality, see [SL95] for a discussion of this type of duality functions. The graphical representation is constructed as follows. For each i ∈ {1, ..., N }, draw a vertical line of length T, which represents time up to a finite end point T. For each such pair (i, j), i, j ∈ {1, ..., N } run m ∈ N independent Poisson processes with parameters (λ k ij ), k = 1, ..., m. At the time of an arrival draw an arrow from the line corresponding to i to the line corresponding to j, marked with the index k of the process. For each k, define functions f k , g k : {0, 1} 2 → {0, 1} 2 . A Markov process (X N t ) with càdlàg paths is then constructed by specifying an initial condition x = (x i ) i=1,...,N , and the following dynamics: X N t = x for all t < τ 1 , where τ 1 is the time of the first arrow in the graphical representation (which is clearly well-defined, since we consider a finite time horizon, and a finite number N of lines). If this arrow points from i to j and is labelled k, then the pair (x i , x j ) is changed to f k (x i , x j ), and the other coordinates remain unchanged. Go on until the next arrow, and proceed exactly in the same way. The dual process (Y N t ) is constructed using the same Poisson processes, but started at the final time T > 0, running time backwards, inverting the order of all arrows, and using the functions g k instead of f k .
Proof. Since we assume that the Poisson processes have the same rates, we can constructŶ N t from the graphical representation of (X N t ), using the same realization of the Poisson processes, reversing time and the directions of all the arrows. It is clear from the construction that then Y N t d = Y N t , and q |Xt∧Ŷ 0 | = q |X 0 ∧Ŷt| hold (see Figure 2). For some more details, in the case of coalescing mechanisms, compare the proof of Proposition 2.3 of [AH07].
We give a list of relevant dual mechanisms in the appendix. As an example, we consider the voter model. Example 4.6 (Voter model and coalescing random walks). Let E N be the complete graph with N vertices. In the above construction, we choose λ ij = λ > 0 for all 1 ≤ i, j ≤ N , i = j, and we use the resampling mechanism f R described above. This means, that at each jump of the λ ij , the process (X t ) = (X 1 t , ..., X N t ) changes in the following way: X j takes on the same value as X i , and all the other values remain unchanged. We know that this process is dual to a system of coalescing random walks, given by (Y t ) = (Y 1 t , ..., Y N t ), where Y i t = 1 if there is a particle at time t at site i. This process can indeed be constructed using the same driving Poisson processes, and the mechanism f C , which is the coalescing dual mechanism to f R . It has the effect that at each arrival of λ ij the particle at site j jumps to site i and merges with the particle at that site. With this procedure, we obtain the well-known duality 1 {X 0 ∧Yt=0} = 1 {Xt∧Y 0 =0} a.s..

Weaker notions of pathwise duality
In this section, we weaken the notion of pathwise duality from above. Note that simple duality and strong pathwise duality can be cast into the following general form: If (X t ) and (Y t ) are strongly pathwise dual, then for every t > 0 and for all x ∈ E, y ∈ F they can be realized on a common probability space (Ω, F, P) such that for all s ∈ [0, t], If they are dual in the usual sense, then for fixed initial conditions x, y they can be realized on a common probability space (Ω, F, P) (note that P = P x,y usually depends on x and y), such that Interpolating between these two extreme cases and using families (X x s ) and (Y y s ) as in Definition 4.2 leads to the following definition.
Definition 4.6 (Conditional pathwise duality). Let (X t ) and (Y t ) be two Markov processes with Polish state spaces E and F , and H : E × F → R measurable and bounded. Suppose that for every t > 0 there are families of processes {(X x s ) s∈[0,t] } x∈E and {(Y y s ) s∈[0,t] } y∈F defined on a common probability space (Ω, F, P) and a σ-algebra D ⊂ F such that the following holds: (i) For all x ∈ E and y ∈ F , the finite dimensional distributions of (X x s ) s∈[0,t] and (Y y s ) s∈[0,t] under P agree with those of (X s ) s∈[0,t] under P x and (Y s ) s∈[0,t] under P y , respectively.
(ii) For all s ∈ [0, t] and all x ∈ E, y ∈ F , Then (X t ) and (Y t ) are called conditionally pathwise dual with respect to H.
An example of this concept is given in Ex. 4.9 below. Clearly conditional pathwise duality implies duality, and strong pathwise duality implies conditional pathwise duality. Moreover, by the tower property of the conditional expectation, if Eq. (36) holds for a σ-algebra D, it holds for every smaller σ-algebra D ′ ⊂ D.
Another interesting situation arises when one looks at functions of processes. More precisely, we place ourselves in the situation of the following lemma.
• Λ intertwines (P t ) and (P t ): for all t ≥ 0,x ∈ E, and measurable A ⊂ E, Then for everyx ∈Ê, the finite-dimensional distributions of (f (X t )) t≥0 when (X t ) has initial law µx = Λ(x, ·) are those of a Markov process (X t ) with semi-group (P t ) started inx.
This notion is applied in Examples 4.7 and 4.8. By a slight abuse of notation we sometimes writeX t = f (X t ).
Proof of Lemma 4.7. Letx ∈ E, s ≥ 0, t > 0, andÂ,B ⊂Ê measurable. Then Hence when (X t ) has initial law µx = Λ(x, ·), it has the same two-dimensional distributions as a Markov process (X t ) with semi-group (P t ) started inx. The other finite-dimensional distributions can be computed in a similar way, and the claim follows.
The kernel Λ(x, dx) has the interpretation of a conditional probability, Lemma 4.7 can be rephrased as follows. Suppose that the process (X t ) preserves the conditional structure Λ, i.e., whenever the initial law of (X t ) is such that Eq. (38) holds at t = 0, then it holds for all t ≥ 0. (In the examples below, this condition becomes: if X 0 is exchangeable, then X t is exchangeable for all t ≥ 0.) Then f (X t ) is Markovian with transition semi-group Now let (X t ), (Y t ), (X t ), (Ŷ t ) be Markov processes with respective Polish state spaces E, F,Ê,F and semi-groups (P t ), (Q t ), (P t ), (Q t ). Let f : E →Ê and g : F →F be measurable surjective maps, and Λ :Ê × B(E) → [0, 1], K :F × B(F ) → [0, 1] transition kernels such that Λ, (P t ), (P t ), f satisfy the support and intertwining conditions of Lemma 4.7, and K, (Q t ), (Q t ), g satisfy the analogous conditions. Fix H : E × F → R measurable and bounded.
Proposition 4.8. In the situation described above the following holds: If (X t ) and (Y t ) are dual with respect to H, then (X t ) and (Ŷ t ) are dual with respect tô H(x,ŷ) := E×F Λ(x, dx)K(ŷ, dy)H(x, y).
In view of Eq. (38) and its analogue for (Y t ) and K(ŷ, dy), the new functionĤ should be thought of as a conditional expectation of H, see Eq. (39) below.
Proof. First we note thatĤ is measurable; this follows from the measurability ofx → Λ(x, A) andŷ → K(ŷ, B) for all measurable A ⊂ E and B ⊂ F . Suppose that (X t ) and (Y t ) are dual with respect to H. Writing P t ⊗ id and id ⊗ Q t for the actions of the semi-groups on the first and second variables of H, we thus have (P t ⊗ id)H = (id ⊗ Q t )H. Moreover, Therefore (X t ) and (Ŷ t ) are dual with respect toĤ.
It is instructive to give a probabilistic construction when (X t ) and (Y t ) are in strong pathwise duality with respect to H. Fix t > 0 and let (Ω, F, P) and (X x s ), (Y y s ) be as in Definition 4.2. Assume in addition that the maps (x, ω) → X x s (ω) are measurable, similarly for Y y s . Enlarging the probability space we may assume that there are families of random variables Vx : Ω → E and Wŷ : Ω → F such that Vx has law Λ(x, ·), Wŷ has law K(ŷ, ·), and the variables Vx, Wŷ are independent between themselves and independent of the variables X x s , Y y s . We definê Thus we randomize the parameters x and y, and apply the surjective maps f and g. Lemma 4.7 shows that the finite-dimensional distribution of (Xx s ) s∈[0,t] are those of (f (X s )) s∈[0,t] when X 0 has law Λ(x ′ , ·), and similarly for (Ŷŷ s ). MoreoverXx 0 =x and Yŷ 0 =ŷ P-almost surely. Assume in addition that for every s ∈ As a consequence, is independent of s ∈ [0, t], for allx ∈Ê andŷ ∈F . This provides a pathwise construction for the duality of (X t ) and (Ŷ t ), though the latter duality need not be strongly or conditionally pathwise in the sense of Definitions 4.2 and 4.6.
Example 4.7 (Moran model and block-counting process). Recall the strong pathwise duality of the voter model and coalescing random walk obtained at the end of the last section from a graphical representation. Fix N ∈ N. Let A t := {i : X i t = 1} and B t = {i : Y i t } = 1, and recall from Example 1.2 that coalescing duality in this context means duality with respect to H(A, B) = 1 {A∩B=∅} . Note that |A t | = N Z t ({1, ..., N }), where Z t is the empirical process of X t . We choose f = g = | · |, and define Λ(a, ·) as the uniform distribution over all configurations A such that |A| = a, and analogously define K(b, ·). This choice of measure means that we choose exchangeable initial conditions. Assume that the driving Poisson processes λ ij in the graphical representation have the same intensity for all i, j. Then A t and B t remain exchangeable for all times t, and the intertwining relation of Lemma 4.7 is satisfied. The functionĤ from Prop. 4.8 then isĤ and therefore this proposition yields that (|A t |) and (|B t |) are dual with respect toĤ N . This duality was derived in [Möh99]. Note that where Hyp(N, |A|, |B|) denotes the hypergeometric distribution function.
Example 4.8 (q−dual processes). Generalizing Example 4.7, we consider set-valued q−dual processes (A t ), (B t ) (cf. Example 4.5) constructed from a graphical representation. This means we work with Poisson processes λ ij (s), i, j = 1, ..., N, 0 ≤ s ≤ t, realized on some probability space (Ω P , F P , P P ). If the intensities are the same for each pair i, j, then the sigma-algebra generated by the Poisson processes is exchangeable. Generalizing the situation of Example 4.7, we let Z N,|A|,|B| ∼ Hyp(N, |A|, |B|), and definẽ the generating function of Z N,|A|,|B| . For fixed a, b ∈ N we can realize (A s ), (B s ) on (Ω P , F P , P P ). Choosing Λ, K as in Example 4.7, we obtain (cf. Lemma 4.5) the duality In the graphical construction of q-dualities we have so far used different Poisson processes for different types of transitions or mechanisms. By basic properties of Poisson processe, we could use one Poisson process per pair of lines, even if we allow for more than one mechanism. The graphical representation then has only one type of arrow, and the processes are constructed from the arrow-configuration by following time forwards, and whenever an arrow is encountered, the type of transition is determined by an additional random variable. We give an example in the above context of q−duality of particle systems.
Example 4.9 (Non-deterministic mechanisms, conditional duality). In the set-up of q−dual mechanisms on {0, 1} G we assume that (X t ), (Y t ) are constructed from a graphical representation of Poisson processes (cf. Ex. 4.8) using two different mechanisms f 1 and f 2 . We want a mechanism f 1 to happen at rate α 1 , and f 2 at rate α 2 . Assume that the Poisson processes for each pair has the same rate λ = a 1 + α 2 . This means that at each time, the law of the arrow in the graphical representation is exchangeable. Assume that at a given time τ there is an arrow from i to j. We will give it type 1 corresponding to mechanism f 1 with probability q = α 1 α 1 +α 2 , and type 2 with probability 1 − q. We could think of the arrow as a random mechansim which is described by the transition matrix In [JK12], a concrete example is given where such a construction leads to a q−duality, which is pathwise in the sense that it is constructed from one realization of the graphical representation, but not strongly pathwise, as it is obtained by averageing over the random mechanisms. It can be viewed as a conditional duality by fixing a sequence (Z n ) n∈N of iid Bernoulli random variables with parameter q, independent of the Poisson processes, and defining D := σ({Z n , n ∈ N}). Then (X t ) and (Y t ) are dual conditional on D.
This construction makes use of the thinning property of Poisson processes. A related approach, also in the context of interacting particle systems, is described in [SL97], [AS12], using thinnings of the (particle) processes instead of the Poisson processes, leading to similar results.

Rescaled processes
So far we considered processes with discrete state space, mostly interacting particle systems. A natural question to ask is whether a (pathwise) duality is preserved in some sense after rescaling. Such ideas were exploited for example in [Sw06,AH07] and in a more sophisticated way in [DK96,DK99]. One simple approach is to approximate the hypergeometric distribution showing up in the context of q−duality by the binomial distribution. One obtains (see [JK12] for the proof) Proposition 4.9. Let (X N t ), (Y N t ) be Markov processes with state space E N that are q N −dual for some q N ∈ [−1, 1). Choose exchangeable initial conditions X N 0 , Y N 0 ∈ E N , fixing |X N 0 | = k N , |Y N 0 | = n N , and suppose that X N t and Y N t stay exchangeable for all t > 0. Assume that n N /N → 0 and E[|Y N t N |/N ] → 0 as N → ∞, for some time scale t N ≥ 0. Then provided that the limits exist.
Depending on the scaling, this result may lead to a moment duality, if . If X N and Y N have the same scaling, we may get a Laplace duality, see [AH07] for an example.
Definition 4.10 (q−moment duality). Let (p t ) t≥0 and (n t ) t≥0 be moment dual Markov processes with values in R and N respectively. Assume that there exist qdual interacting particle converges weakly to (p t ), and (Y N t ) converges weakly to (n t ). Then we say that (p t ) and (n t ) are q−moment dual. If q = 0, we say that (p t ) and (n t ) are coalescing moment duals, for q = −1 we call them annihilating moment duals.
In the next section we will see that moment dualities obtained in this manner retain some of the properties of the approximating particle systems, such as monotonicity. An example of a moment duality obtained in this way is the duality of the (1, b, c, d)−braco-process and the (1, b, c, d)−resem-process, see [AH07,AS12].
Remark (Non-consistency). Note that using basic mechanisms gives a (strong) pathwise construction of all the approximating processes (X N t ), (Y N t ). In passing to the limit, this pathwise construction suffers two problems: The duality with respect to the hypergeometric distribution is not strongly pathwise, and in particular we need to pass from N to N + 1, which necessitates a new choice of the realization of the graphical representation. Hence, the construction given here is only pathwise for finite N, and not for the limiting processes.
Remark (Lookdown-construction). If this step from N to N + 1 can be done in a consistent way, keeping the graphical representation of step N and add arrows in step N + 1, one would obtain a pathwise construction for the limiting duality. This was achieved by Donnelly and Kurtz [DK96] via the so-called lookdown-construction, which provides a pathwise construction of the duality of the Fleming-Viot process and Kingman's coalescent, and has since then be successfully applied to many different situations [DK96,DK99,BBMST]. It is outside the scope of the present paper to do the lookdown-construction full justice. It roughly works as follows: As before, we have a graphical representation, with lines of length T representing time for each particle, labelled by i ∈ N. In the lookdown construction, these are usually drawn horizontally instead of vertically. We also have Poisson processes with rates λ ij , and we draw arrows from i to j, but now they are only allowed to point in one direction: top to bottom, which for exchangeable models is always possible. We can say that λ ij = 0 if i ≤ j. Mechanisms now work in only one direction: If an arrow form i to j is encountered, meaning that i > j, then the site i changes, according to a mechanism and according to the state at j that is seen when following the arrow down -hence the name "lookdown construction". In order to apply the lookdown construction, it is crucial to choose the rates in a consistent way, that is, that the arrows for the first N lines can be kept if more lines are added. This means that we can study the genealogy of a sample of size n < N within the construction of N line. The result of [DK96] tells us that it is possible to take the limit as N → ∞.

Monotonicity
An interesting question is which properties of Markov processes are preserved under duality. One such property is (stochastic) monotonicity. Its intrinsic relation to duality has been investigated for example in [Sie76,AS96,Möh11]. In particular, there is a connection between monotonicity and the existence of a Siegmund dual. We state this connection in Theorem 5.2, and prove it using the results from Section 3. Moreover, we discuss other duality functions, in particular q−moment duals in Corollary 5.6.
Definition 5.1 (Monotonicity). A stochatic process (X t ) on a partially ordered state space for every continuous increasing function f. Hence, a Feller process (X t ) with semigroup P t is monotone if and only if P t f is continuous and monotone for every continuous and monotone function f.

Remark.
In the language of spin systems, monotone processes are often called attractive.
Monotonicity is in some situations equivalent to having a coalescing dual or Siegmund dual [Sie76], see also [Asm03,CS85]. We give below a proof of this result which illustrates the connection to the invariance of the set V 1,+ = { F H(·, y)ν(dy) | ν ∈ M 1,+ (F )}, from Section 3. In the case of Siegmund duality, that is, E = F = R and H(x, y) = 1 {x≥y} we see that f ∈ V 1,+ if and only if there exists ν ∈ M 1,+ (F ) such that (b) Let (X t ) be a monotone process with state space E = R, E = [0, ∞) or E = N, such that x → P x (X t ≥ y) is right-continuous for every t ≥ 0, y ∈ E. Then there exists a process (Y t ) on E ∪ {∞} such that (X t ) and (Y t ) are dual with respect to H(x, y) = 1 {x≥y} .
Proof of Theorem 5.2. (a). Let x ≤ y. Then P x (X t ≥ z) = P z (Y t ≤ x) ≤ P z (Y t ≤ y) = P y (X t ≥ z). Hence (X t ) is monotone. Exchangeing the roles of X and Y completes the proof. (b). If x → P x (X t ≥ y) is monotone increasing and right-continuous, it is a distribution function of a probability measure Q t,y (·) on F. Let f ∈ V 1,+ , and let ν ∈ M 1,+ (F ) such that f (x) = H(x, y)(ν)(dy), with H(x, y) = 1 {x≥y} (cf. the previous example). Let (P t ) denote the semigroup of (X t ). We get Defining a probability measure ν t on F by E Q t,y (·)ν(dy), we see that V 1,+ is invariant under P t . By Proposition 3.13, this implies existence of a Siegmund dual (note that Assumption 3.11 is satisfied).
Corollary 5.3. Let (X t ) and (Y t ) be coalescing dual spin systems. Then (X t ) and (Y t ) are monotone.
Proof. By Proposition 5.2, (X t ) and (1 − Y t ) are both monotone. For spin systems, x ≤ y if and only if x ≥ 1 − y, hence monotonicity of (Y t ) follows from the monotonicity of (1 − Y t ).
The converse is not true, as can be seen by analysing once more V 1,+ . In the case of coalescing duals, we obtain with H(A, B) = 1 {A∩B=∅} that f ∈ V 1,+ if and only if f (X) = B⊂A c ν(B), or equivalently f (A) = C λ(C)1 {A⊂C} for some λ ∈ M 1,+ (P(Λ)). In particular, V 1,+ is a subset of the functions that are monotone decreasing with respect to the partial order given by A ≤ B iff A ⊂ B. If (X t ) has a dual with respect to H, then the semi-group maps every indicator function 1 {·⊂C} into V 1,+ and therefore we see again that (X t ) is decreasing. But the converse is wrong, because the semi-group of a monotone Markov process might map a function 1 {·⊂B} to a monotone function not in V 1,+ .
For duality functions other than the Siegmund (or coalescing) duality, monotonicity need not be preserved under duality: Example 5.1. (Monotone process with non-monotone dual) Consider the voter model on Z d . This is clearly a monotone process, which is either seen directly, or by Proposition 5.2, since it has coalescing random walk as a coalescing dual. It also has an annihilating dual, which is annihilating random walk. This is seen from Lemma B.1 in the appendix. This dual is not monotone: Let x, y ∈ {0, 1} Z d , with x = δ i + δ j , i = j, and y = δ i . Then we have x ≥ y, but annihilating random walk started at x will almost surely reach state 0, while the process started in y will always have at least one 1.
If we have a graphical representation, monotonicity of the process follows from monotonicity of the mechanism. This will help us to prove monotonicity of coalescing moment duals, cf. section 4.3 Definition 5.4 (Monotone mechanism). Let E be partially ordered. A function f : Proposition 5.5. Let (X t ) be a process defined from a graphical representation as described in section 4.1 using Poisson-processes and basic mechanisms. (X t ) is monotone if and only if all basic mechanisms are monotone.
Proof. If all basic mechanisms are monotone, then it is clear that the resulting process is monotone. For the converse direction, consider a monotone process X. Let f be a non-monotone basic mechansim used in the construction of X. Let τ > 0 be the time of the first arrow in the graphical representation. Assume that f is the first type of transition to happen in the graphical representation, this happens with positive probability. We couple two versions of X,X where we can choose initial conditions X 0 = x,X 0 = y ∈ E, x ≤ y, such thatX τ ≤ X τ . This is a contradiction to the monotonicity of X.
Consequently, processes that are derived as in 4.8 via a monotone function are also monotone. Conservation of monotonicity is preserved under rescaling of the duality, hence moment dualities obtained by rescaling a coalescing duality (section 4.3) also preserves monotonicity.
Corollary 5.6. Let (p t ), (n t ) be coalescing moment duals. Then they are both monotone.
Proof. By assumption, there are coalescing dual processes ( and n t is the weak limit of (|Y N t |). (X N t ) and (Y N t ) are monotone by corollary 5.3. Therefore, (|Y N t |) and (N − |X N t |) = (|1 − X N t |) are also monotone. We get, for x ≤ y, x, y ∈ R, Hence (p t ) is monotone, and similarly for (n t ).
Other moment dualities need not imply monotonicity. An example of a moment duality of non-monotone processes is given by rescaling the branching annihilating process from Appendix B and its dual, leading to a Wright-Fisher diffusion with a certain type of balancing selection, cf. [JK12].

Symmetries and intertwining
This section complements the functional analytic theory from Section 3 and explains the relations of duality with the notions of intertwining of Markov processes [DF90, CPY98, HM11, Sw13], symmetries, and methods borrowed from quantum mechanics. Although most of the material is known, there are some new aspects: the main point of Section 6.1 is that some of the quantum symmetries of the physics literature can be given a stochastic interpretation with the help of an argument from [HM11]. Section 6.2 carves out the rigged Hilbert space / Gelfand triple structure (see Eq. (43)) of the representations of creation and annihilation operators used for example in [Doi76,SS94,GKRV09], noting that birth and death are time reversals with respect to natural reference measures.

Intertwining of Markov processes
Let P and Q be stochastic matrices in finite state spaces that are dual with non-degenerate duality function H, so that P H = HQ T with invertible matrix H. Then there is a bijection between the set of duality functions for P and Q and the matrices commuting with P (symmetries): if SP = P S, then SH is a duality function for P ; conversely, ifH is a duality function, then S :=HH −1 commutes with P . An interesting special case is obtained if H = diag(1/π(x)) for some probability measure π with π(x) > 0: in this case every other duality function is of the formH(x, y) = S(x, y)/π(y) for some matrix S commuting with P . Relations of this type have been studied in [Möh99,GKRV09]. In both articles, the symmetries enter in an algebraic way and need not have a stochastic interpretation.
In contrast, the notion of intertwining starts from a stochastic matrix Λ, sometimes referred to as a link between two Markov processes (P t ) and (Q t ). The relation of intertwining and duality has been studied by Carmona, Petit, and Yor [CPY98, Section 5.1] and by Diaconis and Fill [DF90, Section 5]: if a stochastic matrix Λ intertwines (P t ) and (Q t ), i.e, P t Λ = ΛQ t for all t, and (Q t ) is the time reversal of Q t with respect to a measure π such that π(x) > 0 for all x, then H(x, y) := Λ(x, y)/π(y) is a duality function for (P t ) and (Q t ). At first sight, it looks like the restriction that Λ is a stochastic matrix excludes many dualities. Huillet and Martinez [HM11], however, have given a general argument proving that for irreducible Markov chains, in finite state spaces, all duality functions are associated with stochastic matrices.
Quickly summarized, the argument is as follows. Let P and Q be irreducible stochastic matrices. Let π be the stationary distribution of Q. By adding to H multiples of the matrix with all entries equal to 1, we can ensure that H has non-negative entries only, and that the function h := Hπ is strictly positive. Furthermore, because H is a duality function and π is invariant, h is harmonic for P , i.e., P h = h. If P is irreducible, h must be constant and, up to a constant factor, the matrix Λ(x, y) := H(x, y)π(y) is stochastic and intertwines P andQ, the time reversal of Q with respect to π. If P is not irreducible, Λ(x, y) := h(x) −1 H(x, y)π(y) intertwines the Doob h-transform P h with the time reversalQ, see also [DF90,Section 5].
The previous argument can be applied to many dualities of interacting particles with invariant measures, showing that some of the "algebraic" or "quantum" symmetries of [SS94,GKRV09] can be interpreted in terms of commuting Markov chains. We refrain from an abstract description and content ourselves with the following example. and the stochastic matrix Λ associated with H and π is given by q(0, 0) q(0, 1) q(1, 0) q(1, 1) = 1/2 1/2 0 1 .
Λ describes a chain where at each unoccupied site, a particle is born with probability 1/2. Since (P t ) has no transitions between configurations with differing particle numbers, the Doob h-transform P h t (A, B) = h(A) −1 P t (A, B)h(B) is equal to P t . Thus we obtain, in the end, that P t Λ = ΛP t : the symmetric simple exclusion process commutes with the birth mechanism described by Λ. This provides a stochastic interpretation of the "quantum" symmetry exp( k S + k ) (see p. 43 below for an explanation of the notation) studied in [SS94,GKRV09].

Quantum many-body representations of interacting particle systems
There is a close relationship between problems from quantum mechanics and probability, which helps the understanding of duality [SS94,SL95,GKRV09]. Let H be a separable Hilbert space, for example L 2 (E, µ), a space of complex-valued square-integrable functions. We write ·, · for the scalar product, linear in the second entry and conjugate linear in the first entry. The quantum mechanics analogue of contraction semi-groups (P t ) t≥0 are one-parameter unitary groups (U t ) t∈R in H. Stone's theorem [RS] says that every such (U t ) can be written as U t = exp(−itH), for a unique self-adjoint operator H. The Hamilton operator H and Stone's theorem take the place of the infinitesimal generator and the Hille-Yosida theorem in probability. Some quantum mechanical Hamiltonians admit stochastic representations, by which we mean the following: Definition 6.1. Let H be a self-adjoint operator in a Hilbert space H and (P t ) a Markov semigroup with symmetrizing σ-finite measure µ in a Polish space E; we assume that µ(O) > 0 for every non-empty open set O ⊂ E. We say that (U, E, µ, (P t )) is a stochastic representation of H if U : H → L 2 (E, µ) is a unitary operator and ∀t ≥ 0 : exp(−tH) = U * P t U.
Example 6.2 (Ornstein-Uhlenbeck process and harmonic oscillator). Let H = L 2 (R, dx) (dx is Lebesgue measure) and the so-called harmonic oscillator. H with a suitable definition is self-adjoint. As is wellknown, it has a stochastic representation given by the Ornstein-Uhlenbeck process: Let ϕ(x) := π −1/4 exp(−x 2 /2) and define U : with L the infinitesimal generator of the Ornstein-Uhlenbeck process.
Example 6.3 (Simple symmetric exclusion process and quantum spin chain [GS92, SS94, LSD96, GKRV09]). Fix M ∈ N and let H = C 2 ⊗ · · · ⊗ C 2 (M times) with the standard scalar product. Consider the Hermitian 2 × 2-matrices S x , S y , S z determined by S ± correspond to birth and death of a particle, and S + + S − corresponds to spin flip. Let S α k be the operator S α acting on the k-coordinate, e.g., S x 1 (e 1 ⊗ · · · ⊗ e M ) = (S x e 1 ) ⊗ e 2 ⊗ · · · ⊗ e M . Let the Hamiltonian for an isotropic spin 1/2-quantum spin chain. Then for every t > 0, the matrix exp(−tH) in the canonical basis of H is doubly stochastic; −H is the infinitesimal generator of the simple symmetric exclusion process. The relationship is precisely of the form given in Definition 6.1, with underlying state space E = {0, 1} M and µ the counting measure (or, if we normalize and change the scalar product in H by a multiplicative constant, the uniform distribution on E).

Remark.
The previous example is closely related to the topic of stochastic representations of quantum spin chains, going back to Tóth [Tó93] and Aizenman and Nachtergaele [AN94], see also [GSW11] and the references therein. In this context the emphasis is usually put on stochastic geometric aspects that arise in a way similar to percolation problems in graphical representations of interacting particle systems [Har78,Gri79,Lig05].
Theorem 3.24 just says that two dual reversible Markov processes give rise to unitarily equivalent Hamiltonians, H 2 = U H 1 U * ; put differently, if a Hamiltonian has two different stochastic representations, then the representing Markov processes are dual with non-degenerate duality function, provided some regularity conditions. 4 Self-dualities correspond to unitaries U such that U HU * = H -i.e., symmetries of the Hamiltonian. The quantum mechanist's ansatz for finding dualities is, therefore, to look for symmetries and alternative representations of a Hamiltonian. To this aim it is convenient to reinterpret the infinitesimal generators of stochastic processes as Hamiltonians and, for interacting particle systems, to rewrite the operator using notation of quantum-many body theory, notably creation and annihilation operators. In probabilistic terms, this means that we break down transitions into birth and death of particles; for example, in Eq. (42), the hopping of a particle from site k to site k + 1 is written as death at site k followed by birth at site k + 1.
We refer the reader to [GKRV09] for a systematic treatment of the method for finding dualities; the remainder of this section clarifies the representation theoretic structure underpinning the method.
First, let us stress that the quantum mechanics translation can be useful in a broader context than in Definition 6.1, allowing for non-reversible Markov processes and "Hamiltonians" that are not necessarily self-adjoint -for example, a non-symmetric simple exclusion process leads to a non-self adjoint Hamiltonian [GS92]. In fact, we might wish to study processes before even knowing whether they have reversible or invariant measures. For this reason it is useful to work with an a priori reference measure (e.g., Poisson or Bernoulli) and to make explicit that we work in three different spaces: bounded measurable functions, signed measures, and a Hilbert space.
Thus let E be a Polish space and µ 0 a reference probability measure on E. Let H := L 2 (E, µ 0 ). We have the following embeddings: given by ιf := f and ι * f := f µ 0 . The notation ι * is justified by the identity ιf, ϕ H = f, ι * ϕ L ∞ ×M , where f, µ L ∞ ×M := E f dµ. A bounded operator A in H acts on measures that have an L 2 -density with respect to µ 0 and on functions via When E is finite and µ 0 is the counting measure, we may think of functions as row vectors, measures as column vectors (note the unfortunate inversion with respect to the usual probabilist's conventions), and operators as matrices; then ℓ(A)µ = Aµ and r(A)f = f A, i.e., ℓ(A) and r(A) correspond to multiplication from the left and multiplication from the right, respectively. If (P t ) is a Markov process with infinitesimal generator L, we may associate an operator H in H by asking that P * t (f µ 0 ) = (exp(−tH)f )µ 0 ; thus formally, ℓ(H) = −L * . In practical applications, the distinction between an operator in Hilbert space and its representation in the spaces of measures or functions is often suppressed, and we find the relation H = −L * , compare also [GKRV09, Eq. (74)].
For interacting particle systems, there is a standard dictionary towards quantum manybody processes. For example, exclusion processes (at most one particle per lattice site) can be expressed in terms of spin 1/2-operators as in Eq. (42). The other two cases (partial exclusion and unbounded number of particles per site) are explained in the next two examples. The point we wish to stress is that there are natural reference measures, turning birth and death into time reversals or, on a quantum mechanical level, ensuring that creation and annihilation are Hilbert space adjoint operators in some Hilbert space. This fact, to the best of our knowledge, has not been highlighted before.
Example 6.4 (Partial exclusion and spin m/2-chains). Fix m ∈ N and let E := {0, 1, . . . , m}; think of E as the single-site state space for an interacting particle system with at most m particles per lattice site. Let µ 0 (η) := 1/(2 m m η ) be the binomial measure with parameter 1/2, and H = L 2 (E, µ 0 ). We specify J ± and J z via their action on measures, Example 6.5 (Doi-Peliti formalism for systems with unbounded occupation numbers). Let Λ be a discrete set and E = N Λ 0 . For k ∈ Λ, let e k be the vector in E with all entries vanishing except for the k'th which i equal to 1. Define the action of creation and annihilation operators on measures as ℓ(c † k )δ n = δ n+e k , ℓ(c k )δ n = n k δ n−e k . The action on functions is r(c † k )f (n) = f (n + e k ), r(c k )f (n) = n k f (n − e k ).
c † k is creation (birth) of a particle, and c k is annihilation (death) of one of the n k particles at site k. This representation is often used by physicists in order to reformulate stochastic problems as a model of bosons, and is part of the so-called Doi-Peliti formalism [Doi76,Pel85]. The underlying Hilbert space is, to the best of our knowledge, in general not specified. A natural choice is H = L 2 (E, µ 0 ) with µ 0 = ⊗ k∈Λ Poiss(1) the tensor product of Poisson measures with parameter 1. This reference measure has the crucial property that the birth and death mechanisms are time reversals of each other with respect to µ 0 , so that the underlying operators c k and c † k are adjoints in H, c † k = c * k . Remark. An alternative to the bosonic creation and annihilation operators for unbounded occupation numbers is the use of operators connected to representations of SU(1, 1) [GKRV09].
We conclude this section with a remark on group representations; the remark is not of direct relevance for the probabilistic setting, but might be of interest from an algebraic point of view. Groups enter in two ways: first, in relation to self-dualities, the set of unitaries U commuting with a given Hamiltonian H form a group, the symmetry group. Second, many of the basis operators considered above are connected to the Lie algebras of Lie groups, and choices of alternative representations of the Hamiltonian are closely related to alternative representations of the Lie group; see [Kir08] for relevant background on Lie groups. In quantum mechanics, one is especially interested in unitary representations. Therefore we note that our functional analytic setup for interacting particles typically yields three representations, a (unitary, left) representation in the Hilbert space, a (left) representation on measures, and a right representation on functions. "Right" refers to the product rule r(AB) = r(B)r(A). The unitarity of the representation depends on the choice of the reference measure µ 0 , mirroring the time reversal relations mentioned earlier.

Set
For a proof see [JK12] Let us now consider an example, taken from [JK12]. We will assume that the following mechanisms occur: f R occurs with rate r N N for each ordered pair (i, j), i, j ∈ {1, ..., N }, f BA with rate b N N , and assume r N N → α ≥ 0, b N = b a N → β ≥ 0, as N → ∞. Hence, the process |X N t |, t ≥ 0, makes the following transitions: k → k + 1 at rate This implies that the rescaled discrete process Assume now k N → x as N → ∞. Then we see thatG N f (k/N )converges tõ which is the generator of the one-dimensional diffusion given by the SDE This is a Wright-Fisher diffusion with local drift βx(1 − 2x), which has the effect of pushing X t towards the values 0 and 1/2. Consider now the dual process. According to Lemma B.1, (Y N t ) where f A happens at rate r N N , f BA at b N N is an annihilating dual of (X N t ). Its generator is As N → ∞, this converges to Gf (k) := βk (f (k + 1) − f (k)) + αk(k − 1) (f (k − 2) − f (k)) , which is the generator of a branching annihilating process on N.