Convergence of Markov chain transition probabilities

Consider a discrete time Markov chain with rather general state space which has an invariant probability measure μ. There are several sufficient conditions in the literature which guarantee convergence of all or μ-almost all transition probabilities to μ in the total variation (TV) metric: irreducibility plus aperiodicity, equivalence properties of transition probabilities, or coupling properties. In this work, we review and improve some of these criteria in such a way that they become necessary and sufficient for TV convergence of all respectively μ-almost all transition probabilities. In addition, we discuss so-called generalized couplings.


Introduction
It is a classical result that all transition probabilities of a discrete time Markov chain with invariant probability measure (ipm) µ on a rather general state space E converge to µ in the total variation metric provided that the chain is irreducible, recurrent and aperiodic ( [12]). Further, Doob's theorem states that under appropriate additional conditions, ultimate equivalence of every pair of transition probabilities implies the same result (see [3,Theorem 4.2.1] or [10]). Finally the existence of couplings of chains starting at different initial conditions entails total variation convergence to µ. The goal of this paper is to modify the sufficient conditions in the literature in such a way that they become equivalent. It will turn out, for example, that asymptotic equivalence of transition probabilities (which seems to be a new concept) is equivalent to total variation convergence of all transition probabilities. It is also of interest to find weaker conditions which only imply total variation convergence of the transition probabilities starting from µ-almost every x ∈ E. Again we will provide necessary and sufficient conditions similar to those described above. We will also address a convergence property strictly between these two and again we will provide necessary and sufficient conditions. Apart from couplings we will also formulate equivalent conditions in terms of generalized couplings for each of the convergence properties.
Q(x, A) :=P x X n ∈ A for infinitely many n ∈ N , We start by defining three properties of increasing generality about the convergence of Markov chain transition probabilities which we will be interested in.
Properties 2.2. We say that • Property P 1 holds if P n (x, .) converges to µ for every x ∈ E.
• Property P 2 holds if P n (x, .) converges to µ for µ-almost all x ∈ E and lim n→∞ d(P n (x, .), µ) < 1 for all x ∈ E. • Property P 3 holds if P n (x, .) converges to µ for µ-almost all x ∈ E. Remark 2.3. Note that Properties P 1 and P 2 both imply uniqueness of µ (we will show the latter claim in Remark 5.1). Note also that lim n→∞ d(P n (x, .), µ) always exists since µ is invariant and the total variation distance can never increase when applying a measurable map. Therefore, we could replace "lim n→∞ d(P n (x, .), µ) < 1 for all x ∈ E" in P 2 by "for each x there exists some n ∈ N 0 such that d(P n (x, .), µ) < 1" without changing the class of chains for which P 2 holds. One might also be interested in a modificatioñ P 2 of Property P 2 in which the last property lim n→∞ d(P n (x, .), µ) < 1 for all x ∈ E is replaced by uniqueness of µ. Clearly, P 2 is stronger thanP 2 and it is easy to see that it is strictly stronger. PropertyP 2 was studied in [10], for example, but P 2 is more closely related to conditions studied in the literature. We will see, in particular, that the assumptions of [10, Corollary 1] do not only implyP 2 but even P 2 . Example 5.2 shows that one cannot delete the first part of property P 2 without changing the class of chains for which it holds.
We will define four sets of assumptions, one in terms of equivalence or non-singularity of transition probabilities, one in terms of aperiodicity and recurrence or irreducibility properties, one in terms of couplings and one in terms of generalized couplings. It will turn out that all assumption with index i, i ∈ {1, 2, 3}, not only imply property P i but are also necessary for P i to hold. In some cases we formulate conditions with an additional prime (or some other symbol) which will formally be stronger than the same condition without prime but which will in fact turn out to be equivalent (at least when the state space is Borel). Before we state various assumptions we define the (possibly new) concept of asymptotic equivalence of transition probabilities. Definition 2.4. We say that the states x ∈ E and y ∈ E are asymptotically equivalent if for each ε > 0 there exists some n ∈ N and a set A ∈ E such that P n (x, A) ≥ 1 − ε, P n (y, A) ≥ 1 − ε, and the measures P n (x, .) and P n (y, .) restricted to the set A are equivalent.
Remark 2.5. Note that if for given x, y ∈ E, ε > 0 and n ∈ N there exists a set A as in the previous definition, then there exists a setĀ as in the previous definition (with the same ε) if n is replaced by n + 1 (and, by iteration, the same holds for all integers larger than n). This implies, in particular, that asymptotic equivalence induces an equivalence relation on E.
The first set of assumptions is formulated in terms of equivalence or non-singularity of transition probabilities. Assumption 2.6. We say that • Assumption A 1 holds if all pairs (x, y) ∈ E × E are asymptotically equivalent.
• Assumption A 2 holds if for all (x, y) ∈ E × E there exists some n = n x,y ∈ N such that P n (x, .) ⊥ P n (y, .). • Assumption A 3 holds if for µ ⊗ µ-almost all (x, y) ∈ E × E there exists some n = n x,y ∈ N such that P n (x, .) ⊥ P n (y, .).
Lemma A.7 states that the set of all (x, y) ∈ E ×E which are asymptotically equivalent is a measurable subset of (E × E, E ⊗ E). Remark 2.7. Obviously, Property P 1 implies that any two states x, y are asymptotically equivalent (i.e. A 1 holds) while the simple Example 5.3 shows that it does not imply the stronger property "for all x, y ∈ E there exists some n = n x,y ∈ N 0 such that P n (x, .) ∼ P n (y, .)" under which P 1 was shown in [10, Theorem 1].
Before we state the second set of assumptions, we define the concepts of aperiodicity, irreducibility and the Harris property for a Markov kernel P with invariant measure µ.
The chain is called aperiodic if no such d ≥ 2 exists. Definition 2.9. The Markov kernel P is called φ-irreducible if φ is a non-trivial σ-finite measure on (E, E) such that for all A ∈ E with φ(A) > 0 and all x ∈ E we have L(x, A) > 0 (or, equivalently, there exists some n = n(x, A) ∈ N such that P n (x, A) > 0). P is called irreducible if P is φ-irreducible for some non-trivial φ. We say that P is weakly irreducible (with respect to the given ipm µ) if there exists some non-trivial σ-finite measure φ on (E, E) and a set E 0 ∈ E satisfying µ(E 0 ) = 1 such that for every x ∈ E 0 and every A ∈ E with φ(A) > 0 we have L(x, A) > 0. Remark 2.10. It is straightforward to check that if φ is as in the definition (either part), then φ µ. Further, if P is (weakly) µ-irreducible then P is (weakly) φ-irreducible for every non-trivial σ-finite measure on (E, E) satisfying φ µ. We will show in Proposition A.1 the less obvious fact that (φ-)irreducibility implies µ-irreducibility (which, in the terminology of [12,Proposition 4.2.2], means that µ is the maximal irreducibility measure). We will use Proposition A.1 only in the proof of Theorem 2.17. Definition 2.11. [12, p. 199] P or the associated Markov chain X are called Harris (or Harris recurrent), if there exists a non-trivial σ-finite measure φ on (E, E) such that for all A ∈ E with φ(A) > 0 and all x ∈ E we have Q(x, A) = 1 (or, equivalently, L(x, A) = 1 for all x ∈ E and A ∈ E with φ(A) > 0). Assumption 2.12. We say that • Assumption B 1 holds if P is aperiodic and Harris.
• Assumption B 2 holds if P is aperiodic and irreducible. • Assumption B 3 holds if P is aperiodic and weakly irreducible.
Note that Harris recurrence implies irreducibility, so B 1 implies B 2 .
of P x and P y on some space (Ω, F, P) such that lim n→∞ P X k = Y k for all k ≥ n = 1. • Assumption C 2 holds if for all x, y ∈ E there exists some k ∈ N 0 and a coupling ζ ∈ C(P k (x, .), P k (y, .)) such that ζ(∆) > 0.
of P x and P y on some space (Ω, We chose Condition C i such that it is as weak as possible and C i such that it is as strong as possible subject to the requirement that both are equivalent to all other conditions with the same index i (in case the state space is Borel). Note that there are several natural conditions in between C i and C i (i = 1, 2, 3) for which there is no need to state them, since they will all turn out to be equivalent (at least in the Borel case).
Finally, we define the concept of a generalized coupling and formulate the fourth set of assumptions. Definition 2.14. For probability measures ν 1 and ν 2 on (Ē,Ē), define Assumption 2.15. We say that Example 5.4 below shows that not only is it the case that G 2 does not imply G 1 , but neither does the stronger version of G 2 in which > 0 is replaced by = 1.
We now state the three main results of our study in increasing generality that relate the convergence of Markov chain transition probabilities to the four sets of assumptions defined above.
is Borel, then all these conditions are equivalent.

Remark 2.19.
We do not know if the equivalence of all conditions with the same index holds even under our general conditions on the space (E, E). We will comment on this in Remark 5.8.

First results and the proof of Theorem 2.16
Let us first state those implications in the theorems which are obvious from the definitions or are well-known.
Proof. Statement a) is a classical result and a proof can be found for example in [12, p. 328]. The remaining implications are either obvious or easy consequences of the coupling equality stated in the introduction. If, for example, A 2 holds and x, y ∈ E, then there exists some n ∈ N such that P n (x, ·) ⊥ P n (y, ·). Hence d(P n (x, ·), P n (y, ·)) < 1 and C 2 follows from the coupling equality.
We continue by providing a slightly generalized version of the Recurrence Lemma from [10, Lemma 2] that will turn out to be useful later.
If, moreover, P satisfies Assumption A 2 , then If, moreover, P satisfies Assumption A 1 , then (3.1) holds for every x ∈ E.
x ∈ E. Starting X 0 with law µ, we see that ψ(X n ), n ∈ N 0 is a stationary process and, for n ∈ N, ψ(X n ) = P X k ∈ B i.o.|F n almost surely, by the Markov property, where F n = σ(X 0 , ..., X n ) and hence ψ( Then, by the martingale property, P n (x, Ψ i ) = 1 for all n ∈ N 0 and for µ-almost all x ∈ Ψ i , i ∈ {0, 1}. If A 3 holds, or P is µirreducible, then (at least) one of the sets Ψ 0 , Ψ 1 has µ-measure zero. Since µ(B) > 0, Birkhoff's ergodic theorem implies µ Ψ 1 > 0, so µ Ψ 0 = 0 and µ Ψ 1 = 1, finishing the proof of the first statement. Let Assumption A 2 hold and fix x ∈ E. Since P n (y, Ψ 1 ) = 1 for µ-almost all y and all n ∈ N 0 , there exists some y 0 ∈ E such that P n (y 0 , Ψ 1 ) = 1 for all n ∈ N 0 . Now A 2 applied to x and y 0 shows that there exists some n ∈ N such that P n (x, Ψ 1 ) > 0, finishing the proof of the second claim.
Proof. Lemma 3.2, Proposition 3.3 and Remark 2.10 immediately imply the first two implications (with φ := µ) but not the last one since the conclusion of the Recurrence Lemma under the assumption A 3 (or the stronger assumption A 3 ) is weaker than weak irreducibility (the exceptional sets of µ-measure 0 may depend on the set B and there may be uncountably many such sets). Therefore, we argue as follows: for x ∈ E, let R x := {y ∈ E : x, y are asymptotically equivalent}. Assumption A 3 and Lemma A.7 imply that R x ∈ E and µ(R x ) = 1 for µ-almost all x ∈ E. Fix x ∈ E such that µ(R x ) = 1. Since asymptotic equivalence is an equivalence relation by Remark 2.5, it follows that property A 1 holds on R x . Using Lemma A.4, we see that B 1 holds on R x and hence B 3 holds on E.
Before we step into the proofs of Theorems 2.16, 2.17, and 2.18, we sketch how one can see that A i implies C i for i ∈ {1, 2, 3}. The proofs are largely identical to those in [10] where the implicationsÃ 1 ⇒ P 1 , A 2 ⇒P 2 , and A 3 ⇒ P 3 were shown (withÃ 1 slightly stronger than A 1 andP 2 slightly weaker than P 2 and without the assumptions that the state space is Borel). We will need the Borel property only at the end of the proof when we apply the gluing lemma (Lemma A.3).

Proposition 3.5. We have
Idea of the proof. Under A 3 , we define for N ∈ N and p ∈ (0, 1) C N,p ∈ E ⊗ E by Proposition A.6 and Assumption A 3 implies µ ⊗ µ(C N,p ) > 0 for some N and p. Fix N and p and write C := C N,p . Let us first assume that N = 1 (this is without loss of generality for proving A 3 ⇒ P 3 but not without loss of generality for proving A 3 ⇒ C 3 ). In [10], the authors proceed by constructing a Markov chain (Z n ), n ∈ N 0 on the product space E × E, which is a coupling of two chains with Markov kernel P with transition kernel S defined as Here, R (x, y), . is the product of P x, . and P y, . and the kernel Q satisfies Q (x, y), ∆ = 1−d P (x, .), P (y, .) and Q (x, y), . restricted to (E ×E)\∆ is absolutely continuous with repect to the product of P x, . and P y, . (the fact that such a kernel Q exists is stated in [10, Lemma 1]). The idea behind the definition of the kernel S is the following: whenever the chain on E × E is in a state (x, y) ∈ C, then we try to couple the two coordinates in the next step by applying Q which maximizes the coupling probability. Otherwise, we let the two coordinates move independently until the pair hits the set C. As soon as the chain (Z n ) hits the diagonal ∆ it remains in that set forever. It remains to ensure that the set C is hit infinitely many times and therefore the process (Z n ) will ECP 26 (2021), paper 27. almost surely eventually hit ∆. The fact that (Z n ) will hit the set C almost surely in finite time can be seen as follows: consider an independent coupling (W n ) of two copies of the chain. Since µ ⊗ µ(C) > 0, the Recurrence Lemma shows that (W n ) will hit the set C almost surely in finite time for almost all initial conditions and even for all initial conditions if we assume A 1 . Since, up to the first hitting time of the set C, the processes (W n ) and (Z n ) have the same law, (Z n ) will also hit the set C almost surely in finite time. If the coupling attempt at that time is unsuccessful, then the chain (Z n ) again performs an independent coupling up to the next hit of C, which, by the same argument (and the strong Markov property and the assumptions on the kernel Q), is an almost sure event. The constructed coupling therefore shows that C 1 holds under A 1 and both C 3 and P 3 hold under A 3 . Further, under A 2 , for any pair x, y ∈ E the probability that the constructed coupling is successful, is strictly positive by the second part of the Recurrence Lemma, so C 2 holds. This proves the claims in case N in the definition of the set C N,p can be chosen to be 1. Finally, we assume that N ≥ 2. The first claim follows from the case N = 1 since n → d(P n (x, .), µ) is non-increasing. To see the remaining claims, we apply the previous consideration to the skeleton chain evaluated at integer multiples of N and obtain corresponding couplings Z nN = (X nN , Y nN ), n ∈ N 0 for the skeleton chains as above. We have to make sure that these can be appropriately interpolated between successive multiples of N . This follows from an application of the gluing lemma in the appendix (which requires the state space to be Borel) to each gap between successive multiples of N (with conditionally independent interpolations), see [14, p.43] for a similar construction (it seems that the authors forgot to mention that this construction requires the space to be Borel, see Remark 5.8).
We are now ready to prove Theorem 2.16.
Proof of Theorem 2.16. Observing Proposition 3.1, Corollary 3.4 and Proposition 3.5 the claim follows once we prove that G 1 ⇒ A 1 . G 1 ⇒ A 1 : Fix a pair (x, y) ∈ E×E. We show that x and y are asymptotically equivalent. Fix ε > 0. By assumption there exists some ξ ∈Č(P x , P y ) such that lim k→∞ ξ k (∆) = 1. Since ξ 2 and P y are equivalent, we can find some δ > 0 such that for every Γ ∈ E ⊗N0 satisfying ξ 2 (Γ) < δ, we have P y (Γ) < ε. Let n 0 ∈ N 0 be such that ξ k (∆) > 1 − δ for every k ≥ n 0 . Then, for B ∈ E and n ≥ n 0 , where we used absolute continuity of ξ 1 n with respect to P n (x, .) in the first step. Reversing the roles of x and y we get P n (y, B) = 0 ⇒ P n (x, B) < ε for all n ≥ n 1 . Fix n ≥ n 0 ∨ n 1 and let B 0 ∈ E be a set which maximizes P n (y, B) among all sets B ∈ E which satisfy P n (x, B) = 0 and let C 0 ∈ E be a set which maximizes P n (x, C) among all sets C ∈ E which satisfy P n (y, C) = 0. Note that such sets exist: pick an increasing sequence of sets B m ∈ E such that P n (x, B m ) = 0 and P n (y, B m ) ≥ sup{P n (y, B) : P n (x, B) = 0} − 1 m . Then B 0 := ∪ m B m does the job (and similarly for C 0 ). Define A := E\(B 0 ∪ C 0 ). Then P n (x, A) ≥ 1 − ε, P n (y, A) ≥ 1 − ε and the restrictions of P n (x, .) and P n (x, .) to A are equivalent. The claim follows since ε > 0 was arbitrary.

Proofs of Theorems 2.17 and 2.18
Building on the results from the previous section we can now prove Theorem 2.17 and 2.18.
Proof of Theorem 2.17. Thanks to Proposition 3.1, Corollary 3.4 and Proposition 3.5, the theorem is proved once we establish B 2 ⇒ P 2 . Rather than adapting the proof of ECP 26 (2021), paper 27. B 1 ⇒ P 1 we prefer to argue along the following lines: if B 2 holds, then we show that there exists an invariant set E 0 ⊂ E (i.e. E 0 ∈ E and P (x, E 0 ) = 1 for all x ∈ E 0 ) of full µ-measure on which B 1 holds and hence, by Theorem 2.16, P 1 holds. Then we show that P 2 holds on the full space E. B 2 ⇒ P 2 : We are not aware of a simple direct proof that there exists a subset of full µ-measure on which B 1 holds. Even though, thanks to Proposition A.1, irreducibility implies µ-irreducibility which, by the Recurrence Lemma, implies that Q(x, B) = 1 for every B ∈ E for which µ(B) > 0 and µ-almost every x ∈ E, the exceptional sets may depend on B and there are (typically) uncountably many such sets B.
Since P is µ-irreducible, Proposition A.2 shows that there exists a small set C ∈ E (with ν and m as stated there). We can and will assume that ν(E\C) = 0. Define G := {x ∈ E : Q(x, C) = 1}. Then G ∈ E, G is invariant, and, by the Recurrence Lemma, µ(G) = 1. We claim that property B 1 holds on G. All we have to show is that Q(x, B) = 1 for all x ∈ G and all B ∈ E such that µ(B) > 0. Fix such a set B and let H ε,l := {x ∈ G ∩ C : P x l i=0 {X i ∈ B} ≥ ε} for l ∈ N 0 and ε > 0. Since L(x, B) > 0 for all x, there exist l ∈ N 0 and ε > 0 such that for H := H ε,l we have ν(H) > 0. Then P m (y, H) ≥ ν(H) > 0 for every y ∈ C.
This means that whenever the chain is in the set C then the probability of hitting the set B within the next m + l steps is bounded away from 0. Therefore, by the strong Markov property, the chain starting in y ∈ G will almost surely hit B infinitely often.
Using Lemma A.4, G equipped with the trace σ-field satisfies our assumption on the state space and we see that property B 1 holds on G.
Theorem 2.16 shows that property P 1 holds on G. Then, clearly, property P 3 holds on E. Since P is irreducible, we have L(x, G) > 0 and hence lim n→∞ d(P n (x, .), µ) < 1 for every x ∈ E and therefore P 2 holds on E.
Proof of Theorem 2.18. By Proposition 3.1, Corollary 3.4 and Proposition 3.5 it suffices to show that B 3 ⇒ P 3 . B 3 ⇒ P 3 : We can argue like in the proof of B 2 ⇒ P 2 (the present argument is even easier). Using the very definition of weak irreducibility, we find an invariant set E 0 of full µ-measure on which B 2 and hence, using Theorem 2.17, P 2 hold. Therefore, P 3 holds on E.

Complements, examples, and open problems
Remark 5.1. We show that Property P 2 implies uniqueness of µ (as claimed in Remark 2.3): assume that µ andμ are different ipm's and letμ := 1 2 µ +μ . Since P 2 ⇔ A 2 and property A 2 is independent of the chosen ipm, we see that P 2 holds with respect to both µ andμ, so P n (x, .) converges to µ for µ-almost all x and toμ forμ-almost all x. Sincê µ µ andμ = µ this is a contradiction (this proof is adapted from [10, Proof of Corollary 1]). the unique invariant probability measure and P n (x, .) does not converge to µ if x > 0, so P satisfies P 2 but not P 1 . Note that for each x, y ∈ E and k ≥ x ∧ y, ζ := δ 0 ⊗ δ 0 satisfies ζ ∈C(P k (x, .), P k (y, .)) and ζ(∆) = 1, showing that if "> 0" in Assumption G 2 is replaced by "= 1", then the condition does not imply G 1 .
Remark 5.5. Note that Assumption G 1 is formally weaker than requiring that for each pair (x, y) ∈ E × E there exists some ξ ∈C(P x , P y ) such that ξ 1 ∼ P x and ξ 2 ∼ P y , but these two conditions are in fact equivalent: according to G 1 we find, for each pair (x, y), someξ ∈Č(P x , P y ) such that lim k→∞ξk (∆) = 1 and someξ ∈Č(P y , P x ) such that lim k→∞ξk (∆) = 1. Then ξ := 1 2ξ + 1 2ξ satisfies the formally stronger condition.
Remark 5.6. One may ask whether it is sufficient for P 1 to hold if for each pair (x, y) ∈ E × E and each k ∈ N 0 there exists some probability measure ζ k on (E × E, E ⊗ E) whose marginals are equivalent to P n (x, .) and P n (y, .) respectively, such that lim n→∞ ζ k (∆) = 1. Again, Example 5.4 provides a negative answer. Consider ξ as in the previous example. Then lim k→∞ ξ k (∆) ≥ lim k→∞ ξ k ({(0, 0)}) = 1. Note that the marginals of the measures ξ k are equivalent to P k (x, .) and P k (y, .) respectively but that ξ 1 and ξ 2 are not equivalent to P x respectively P y .
Remark 5.7. From Theorem 2.16 we know that C 1 ⇒ P 1 holds since C 1 ⇒ A 1 ⇒ B 1 ⇒ P 1 . Here we present an essentially well-known direct proof. For x ∈ E, n ∈ N, and A ∈ E we have which converges to 0 by dominated convergence (note that Proposition A.6 shows that the last integrand is measurable with respect to y), so the claim follows.
In fact, a slight modification of the proof shows the result without employing Proposition A.6 (and without assuming that E is countably generated): fix x and let R n (y, A) := P n (y, A) − P n (x, A) , n ∈ N. There exist sets A n ∈ E such that U n := sup A∈E E R n (y, A) dµ(y) ≤ E R n (y, A n ) dµ(y) + 2 −n , which converges to 0 as n → ∞ by dominated convergence. can not be applied in this case: [1] contains an example of a separable and metric space equipped with its Borel σ-field for which the conclusion in the gluing lemma fails.
Remark 5.9. It maybe of interest to generalize some of our results from Markov chains to stochastic recursive sequences driven by a stationary sequence. Such models have been investigated, for example, in [7] where they were applied to some economic models.

A Auxiliary results and measurability issues
A.1 µ-irreducibility and the existence of small sets We start with a proposition which was announced in Remark 2.10 and whose proof is inspired by that of [ Proof. Let P be φ-irreducible. Then φ µ (see Remark 2.10) and, due to Lebesgue's theorem, there exists a set B ∈ E such that φ and µ restricted to B are equivalent and φ(B c ) = 0. Note that µ(B) > 0. If µ(B c ) = 0, then φ ∼ µ and we are done, so we assume that µ(B c ) > 0. We have to show that for any measurable set C ⊂ B c such that µ(C) > 0 we have L(x, C) > 0 for every x ∈ E. Fix such x and C and define the measure Invariance of µ implies ν µ and that the restriction of both measures to B are equivalent. Let G ∈ E be a set such that ν ∼ µ on G, ν(G c ) = 0 and B ⊂ G.
In this case µ ∼ ν and so ν(C) > 0 which implies that there exist some ε 2 > 0 and m 2 ∈ N such thatD := {y ∈ B : P m2 (y, C) ≥ ε 2 } satisfies µ(D) > 0. Next, φ-irreducibility and the definition of the set B imply L(x,D) > 0, which, together with the definition of D, implies L(x, C) > 0, so the proof of the proposition is complete.
The following proposition is an easy consequence of the rather deep Theorem 5.2.2 in [12] (which is a key step in the proof of B 1 ⇒ P 1 (in our notation)) and of the (not so deep) previous proposition.
Proposition A.2. ([12, Theorem 5.2.2]) Let P be irreducible. Then there exists a small set C, i.e. a set C ∈ E such that µ(C) > 0 for which there exist a finite measure ν and some m ∈ N such that ν(C) > 0 and P m (x, B) ≥ ν(B) for all x ∈ C and B ∈ E.
Proof. Theorem 5.2.2 in [12] assumes that P is ψ-irreducible where ψ is a maximal irreducibility measure. By the previous proposition we can take ψ = µ and therefore the conclusions of [12, Theorem 5.2.2] and of Proposition A.2 are the same.
Proof. The last statement is clear. To see the first, define E 0 :=Ẽ and E i+1 := {x ∈ E i : P (x, E i ) = 1}, i ∈ N 0 . ThenÊ := i E i does the job.
In the following two statements we assume that (E, E) satisfies our general assumptions spelled out in the introduction and that Q andQ are Markov kernels on E. This lemma is used in [9] to prove a result which, in particular, implies the following proposition (which is not immediate since the supremum of an uncountable family of real-valued measurable functions need not be measurable). is measurable.
Lemma A.7. The set of all (x, y) ∈ E × E for which x and y are asymptotically equivalent is a measurable subset of (E × E, E ⊗ E).
Proof. Applying Lemma A.5 with Q =Q = P n we see that there exists a jointly measurable function f n such that P n (x, A) = A f n (x, y; z)Λ n (x, y; dz), P n (y, A) = A f n (y, x; z)Λ n (x, y; dz), for all x, y ∈ E (with Λ n defined as in Lemma A.5). Defining A n (x, y) := {z ∈ E : f n (x, y; z)f n (y, x; z) > 0}, we see that A n (x, y) ∈ E and that P n (x, .) and P n (y, .) restricted to A n (x, y) are equivalent. Further, A n (x, y) is the largest set (up to sets of measure 0 with respect to Λ n (x, y; .)) with this property. Observe that the map (x, y) → P n (x, A n (x, y)) = 1 An(x,y) (z) P n (x, dz) is measurable (by a well-known application of the monotone class theorem) since the integrand is jointly measurable. The claim follows since x and y are asymptotically equivalent iff lim n→∞ P n (x, A n (x, y)) = lim n→∞ P n (y, A n (x, y)) = 1.