Examples of Convergence and Non-convergence of Markov Chains Conditioned Not to Die

In this paper we give two examples of evanescent Markov chains which exhibit unusual behaviour on conditioning to survive for large times. In the first example we show that the conditioned processes converge vaguely in the discrete topology to a limit with a finite lifetime, but converge weakly in the Martin topology to a non-Markovian limit. In the second example, although the family of conditioned laws are tight in the Martin topology, they possess multiple limit points so that weak convergence fails altogether.

Almost all the examples we are aware of have the property that the sequence of laws of the process conditioned to survive until at least time T converges to an honest limit (except for Bertoin and Doney [1]), in other words, the limiting conditioned process has an infinite lifetime, or equivalently P ∞ = P h , where h is P -harmonic, and P and P ∞ are, respectively, the transition kernels of the original and limiting processes respectively.In this paper we present two examples where this is not true.In the first, described in §3, we show that the law of a symmetric Markov chain on Z + with a certain transition structure (the infinite star or Kolmogorov K2 chain (see Kendall and Reuter [13]), when conditioned to survive for large times, converges vaguely to a defective (substochastic) law (corresponding to the limit of the conditioned processes having a finite lifetime).We also show that the laws converge weakly (in the Skorokhod topology derived from the Martin topology), but to a non-Markovian limit.In the second counterexample, described in §4, we show that the conditioned laws of another symmetric Markov chain, living on two copies of Z + (the double infinite star) are tight (in the Skorokhod topology derived from the Martin topology) but have multiple limit points, showing that weak convergence fails altogether.
Our understanding of why the examples produce such behaviour can be summarised as follows.Because of the structure of the examples, essentially the only way the processes can survive for a time T is to jump to a large state and wait there for a time S ∼ T .In the first example this means that 'in the limit the process has to make an infinite jump', which, thanks to the form of the Martin compactification, corresponds to the behaviour outlined above.To establish this we require only very moderate constraints on the jump rates of the process.In the second example the same situation pertains, but the transition structure subdivides the large states into two classes (with odd and even numbering respectively).By making strong assumptions on the transition rates we obtain extraordinarily tight estimates on the eigenvalues of the Q-matrix which enable us to show that, for suitable times T n , survival for time T n 'requires' jumping to state n and waiting there for time S ∼ T n .Thus as T → ∞, the requirement to make an 'infinite jump' cycles between 'infinite even states' and 'infinite odd states'; behaviours which, thanks to the Martin compactification, can be distinguished in the corresponding limit laws.

Preliminaries on evanescent chains and the Martin topology
2.1 Evanescent chains Recall that a Markov chain X, on a state space E, with law P and transition kernel P is said to be evanescent if P is dishonest and ζ, the lifetime of X, is almost surely finite.
For each T > 0, we define P T by P T (A) = P(A|ζ > T ).
It is shown in Jacka and Roberts [11] that if P T converges weakly to an honest limit P ∞ , then P ∞ must be the law of a time-homogeneous Markov chain with transition kernel P ∞ , and P ∞ is the Doob h-transform of P by some (space-time) P -harmonic function h(t, x), of the form e λt β(x).
The probability measure P T is the law of a time-inhomogeneous Markov chain with the same initial law as P and with transition matrix P T (•, •) given by where s is the survival function: So, to examine convergence of the P T s it is obviously of interest to consider convergence of s i (T − t)/s j (T ).It is worth noting at this point that, for any fixed i 0 , the function s i (t)/s i 0 (t) is superharmonic.

The Martin topology
The set of (non-negative) superharmonic functions on E is (as a subset of R E ) a convex cone.The set S, consisting of functions h in this cone which are normalised so that h(0) = 1, is a section through this cone.According to the theory of Martin boundary (see Williams [25] or Meyer [14]), any h ∈ S can be written as a unique convex combination of the extremal elements of S, which are denoted S e .Thus where ν is a probability measure on S e .
Elements of S e are of two types.Firstly they can arise by normalising the Green's functions, (where i 0 is a fixed reference point and we assume that Γ i 0 ,j > 0 for all j ∈ E), or, alternatively, ξ is a harmonic function, which may be obtained as a pointwise limit of extreme points of the first kind.The collection of points of the first kind, together with all limit points is (using the identification of j ∈ E with κ(•, j)) the Martin compactification, Ē, of E; whilst in our examples below (but not in general) we shall see that there are no harmonic h and so, in these cases, S e may be identified with E.
Whenever this happens S e is a compact subset of R E endowed with the topology of pointwise convergence and this determines a topology for E (sometimes called the Martin topology): In general, for every (positive) superharmonic function h : E → R, there is a unique Choquet-Martin representation of h, of the form: and the representing (probability) measure ν h lives on S e ⊆ Ē. Proof: From Lemma 2.2.1 we need only prove that h t is superharmonic, and this follows immediately from the fact that j P i,j (u)s j (t) = s i (t + u) ≤ s i (t) Since we shall need to study the convergence of the normalised survival functions, Corollary 2.2.2 shows that it is of interest to study their representing measures, and we shall do so in §3 and again in §4.

Notation, reminders and the canonical setup
Recall that a sequence of probability measures (P n ) on (the Borel sets of) a topological space (E, E) is said to converge vaguely to a limit measure L if, for every continuous, compactly supported function f : we denote this by P n v ⇒ L. We denote weak convergence by w ⇒.We introduce some more notation here: if P is a measure on (Ω, F) and ρ : Ω → R + is measurable then ρ • P denotes the measure Q on (Ω, F) such that dQ dP = ρ.As announced earlier, the type of convergence we get for the (P T )s depends on which topology we use on our state space, E. In order to render P honest, we extend our initial state space Z + by adding an isolated cemetery state ∂ so that E ∂ def = Z + ∪ {∂}.We shall be concerned with the Skorokhod spaces of cadlag paths on E ∂ : (where M denotes the Martin topology of E ∂ , or more precisely, the topology on E ∂ obtained by taking the Martin topology on Z + and expanding it by adding back in the isolated cemetery state ∂) which we shall denote by D E (R + ) and D M (R + ) respectively.
Similarly, we define For our probabilistic setup we shall take Ω = D E (R + ); F is taken to be the Borel σ-algebras of D E (R + ) (under the Skorokhod J 1 metric) and , where π t is the restriction map: π t : ω → (ω s : s ≤ t).
For our Markov process, X, we shall take the canonical process: Our probability measure P has an arbitrary initial distribution (on Z + ) and has a transition kernel P on R + × Z + × Z + which is Feller minimal.We then make the standard extension of P to P ∂ by setting We denote the restriction of any (sub-)probability measure Q on F to F t by Q t .
The following lemma also demonstrates why the Martin compactification is helpful when considering weak convergence of the P T s: Lemma 2.3.1 Suppose that X is an evanescent chain on a countable state space, with law P and bounded Q-matrix Q (so q i ≤ q < ∞ for all i).Then the collection (P T ) T ≥0 are tight in D M (R + ).
Proof: It follows from Theorems 3.7.2,3.8.8 and Remark 3.8.9(b) of Ethier and Kurtz [8] (see also Billingsley [2]) that we need only establish that for every η > 0 and rational t ≥ 0 there exists a compact set Γ in Ē such that inf and there exist , C, θ > 0 such that for all 0 < h ≤ t and for all λ > 0 sup where d is the metric on Ē (which may be taken to be any metric corresponding to pointwise convergence, and hence may be assumed bounded by 1).
Since Ē is compact, (2.3.1)follows automatically.To prove (2.3.2) we may clearly restrict attention to the case where t + h ≤ T and λ < 1.Now, for any λ > 0, and hence (2.3.2) holds for any > 0 3 Convergence of the star process

3.1
The infinite star The (evanescent) Markov chain we consider in this section is the infinite star (or Kolmogorov K2 chain).The state space is Z + and the (symmetric) Q-matrix for the chain is , where q = i q i < ∞, the q i s are strictly decreasing and δ > 0.
Our results on the infinite star are as follows: for each t > 0 .Then we may define P ∞ , a probability measure on F, by P t ∞ = ρ t • P t , and then, for each t ≥ 0 in the original Skorokhod topology, D E ([0, t]), while

Remark:
The existence and uniqueness of P ∞ follows from the fact that the (P t ∞ ) are consistent (see §3.4) and the existence and uniqueness of a projective limit (see Parthasarathy [15]).
Remark: As we shall see later, P ∞ is a non-Markovian law which corresponds to the process "sticking in 0".
The rest of this section is taken up with a lengthy proof of Theorem 3.1.1,but it seems appropriate to sketch the proof here, since parts of it are (we believe) rather novel.

Sketch of proof of Theorem 3.1.1
We first prove that s j (t)/s i (t) t−→∞ −→ 1 for all i, j ∈ Z + .We do this via the method indicated in §2 (showing weak convergence of the representing probability measures).We then show that s j (T − t)/s j (T ) T −→∞ −→ 1 for each t and j.This establishes that s j (T − t)/s i (T ) T −→∞ −→ 1 for each i, j and t.From this we can deduce directly the vague convergence by identifying the limiting behaviour of dP T t dPt .We also use this result to deduce the weak convergence of the finite dimensional distributions in the Martin topology and then establish the weak convergence on path space by Lemma 2.3.1.

3.2
The Martin topology of the infinite star First note that Γ, the Green's kernel corresponding to the infinite star is given by: so that, taking 0 as our reference state, . It follows immediately from this fact that the Martin boundary is empty, that Ē = E = S e and that the Martin topology, which we shall denote by M, is characterised as follows:

Weak convergence of the representing measure
As we noted in §2.2, the function h t , defined by h t (i) = s i (t)/s 0 (t) is superharmonic, and hence has a representing measure ν t .We want to show that ν t ⇒ ∂ 0 , where ∂ i denotes the (unit) point mass at i. Given the structure of M, it is clearly sufficient to show that ν t (n) → 0, for each n ≥ 1.We achieve this with the help of the eigenfunction expansion of P i,j (t).We may find such an expansion by virtue of: where λ n and βn are respectively the (negative of the) eigenvalues and the eigenfunctions (l 2 -normalised) of Q.
Proof: Set Q (n) to be the approximation of Q obtained by setting all the rows of Q after the nth to 0, then Q (n) has finite dimensional range and it is easy to check that −→ 0, so by Theorem 4.18, part (c) of Rudin [22] Q is a compact operator.It thus possesses (by part (c) of Theorem 4.25 of Rudin [22]) a countable spectrum and, being self-adjoint, it follows, by part (d) of Theorem 12.29 of Rudin [22], that the eigenfunctions of Q can be chosen to be a complete orthonormal basis for l 2 .Since P (t) = e Qt , the representation of P i,j (t) follows It is easy to check that, normalising so that β n (0) = 1, and the eigenvalue equation is where Since S(q n −) = ∞, S(q n +) = −∞ and S is continuous and increasing on (q n+1 , q n ) and on (q 1 , ∞), it follows that Proof: Given > 0 take N such that min(a n , b n ) > 0 and a n /b n < 1 + for all n ≥ N .Now take T such that n<N e −λnt |a n | < e −λ N t a N and n<N e −λnt |b n | < e −λ N t b N for all t ≥ T then, for t ≥ T : Similarly, we may deduce the same inequality with the roles of a n and b n reversed and, since is arbitrary we deduce the required convergence The following lemma is the penultimate step in the argument in this section: Proof: Since ν t is the unique probability measure such that to prove (3.3.5)we need only check that the proposed solution satisfies (3.3.6) and is a probability measure.Summing the right hand side of (3.3.5)gives 1, whilst if we recall that Γ commutes with P we see that If we now apply Lemma 3.3.4,we see that It follows from this that, for any n and for any i ≤ n: and hence, since n is arbitrary, that lim t ν t (i) = 0.As we remarked earlier, this is sufficient to establish the convergence of ν t to the point mass at 0

Weak convergence of the finite dimensional distributions
We first prove that the family (P t ∞ ) t≥0 are consistent.

Lemma 3.4.1
The process ρ t is a non-negative P-martingale with ρ 0 = 1.

Proof:
The process ρ is clearly a finite variation process and, setting we see that It follows immediately from the characterisation of Q that the dual previsible projection of ρ is and the martingale property is now immediate Remark: Since ρ is a non-negative martingale with initial value 1, P ∞ is a probability measure.
and hence It follows from this that This justifies the remark after Theorem 3.1.1.
We now prove: The finite dimensional distributions of P T converge to those of P ∞ .
Proof: As we observed in §2, P T is given by Given the Markov property of the P T it is clearly sufficient to show convergence of the conditional one-dimensional distributions.Now, given an f , continuous (and bounded) in the Martin topology, then if P T i,j (s, t) T −→∞ −→ P i,j (t − s), for each i, j, s, t, we can conclude that (the last equality following from (3.4.1)), in other words that the finite dimensional distributions converge.

Weak and vague convergence of the conditioned laws
We shall use the following lemma: Lemma 3.5.1 Suppose that Q is a finite measure on an arbitrary topological measure space by dominated convergence and the assumption on ρ * Proof of vague convergence It follows from the Markov property and the form of P T that P T t = ( s i (T ) ≤ (P i,j (t)) −1 and inf i,j≤n P i,j (t) > 0 (by irreducibility), we obtain the vague convergence by Lemma 3.

4.1
The double star The (evanescent) Markov chain we consider in this section is two copies of the infinite star of §3, linked at their centres.For ease of notation we denote the states in the first star by the even integers in Z + (with 0 the centre), whilst the states in the second star are denoted by the odd integers (with 1 as the centre).Thus the Q-matrix for the double star process is: where We shall assume that each q i is strictly positive and that q n n−→∞ −→ 0. The following additional assumptions will be needed: Note that if q n = 2 −2 n for each n, r = 1 and δ 0 = δ 1 = δ > 0 then all our assumptions are satisfied.
Our results on the double star are as follows: where κ is the normalised Green's function for the double star then, (i) for each α ∈ [0, 1], there exist sequences (s α n ) n≥1 such that for each t ≥ 0, in the original Skorokhod topology, D E ([0, t]), ), and P α by P α t = ρ α t • P t , with Then the collection (P T ) T ≥0 is tight in D M (R + ) and the collection of limit laws is Remark: As we shall see later, ( h α (Xt) h α (X 0 ) 1 (ζ>t) ) • P t is a dishonest law, corresponding to the h α Doob transform of P, which just "loses the process when it dies".For each α, P α is a non-Markovian law, which coincides with P h α up to the "disappearance time" and which then makes the process "die to 0 or 1, rather than disappearing".We outline the differences between the proof of Theorems 4.1.1 and 3.1.1:Sketch of proof of Theorem 4.1.1We first identify the set of limit functions for s j (t)/s i (t) (by identifying the set of weak limits for the representing probability measures).
The task is then very similar to that confronted in §3.

4.2
The Martin topology of the double infinite star Notice first that Γ, the Green's kernel corresponding to the double infinite star is given, for j even, by : if i is even and either i = j or j = 0, : if i = j > 0 and, for j odd, by : if i is odd and either i = j or j = 1, It follows that, taking 0 as our reference state, κ is given, for j even, by : if i is odd,

1
: i f i is even and either i = j or j = 0, 1 + δ 0 δ 1 +r(δ 0 +δ 1 ) q j (r+δ 1 ) and, for j odd, by : if i is odd and either i = j or j = 1, . It follows immediately from this fact that the Martin boundary is empty and the Martin topology, which we shall again denote by M, is characterised as follows:

4.3
Bounding the eigenvalues of P As before, h t (•) has a representing measure ν t .We want to show that L, the set of weak limits of (ν t ) t≥0 , is of the form We proceed in a similar fashion to that in §3. 3. Note first that the proof of lemma 3.3.1 applies in the case of the double star so that Some elementary calculations establish that, with the normalisation β n (0) = 1, the eigenfunction equation is with β n (1) = r T (λn) , where and that the (residual) eigenvalue equation is S(λ)T (λ) − r 2 = 0.
It follows that ST − r 2 has a unique zero on (q 2 , ∞) and on (q n+1 , q n ), for each n ≥ 2. Now, for any c > 0, there exists a C such that for large n, Taking c −1 respectively greater than and less than δ 1 +r− r 2 (δ 0 +r) we obtain the second convergence in (4.3.4); in a similar fashion, we deduce that and we obtain the first convergence in (4.3.4) In the next section we shall need some estimates on the l 2 -normalised eigenvectors, which we collect together in the following lemma: for suitable non-zero limits m 0 and m 1 .
Notice also that, since λ n < q n ≤ q i /2, for any i < n, we may deduce from (4.On the other hand, given > 0, there exist c and n( ) such that, for i > n > n( ), Thus, to establish (4.4.3), we need only prove that (since the λs are decreasing and kq k → 0) We are now in a position to prove Proof: Since ν t (i) t−→∞ −→ 0 for each i, it follows from the form of the Martin topology M that the only possible weak limits are those in L. Lemma 4.4.2shows that the extreme laws in L are limit laws (ν t 2n ⇒ ∂ 0 and ν t 2n+1 ⇒ ∂ 1 ), so it only remains to establish that for each α ∈ (0, 1) there exists a sequence s α n such that ν sn ⇒ α∂ where f Γ : i → Γ 0,i f (i).Now if f is bounded and continuous then so is f Γ and hence the weak continuity of t → ν t follows from that of P (t), which in turn is an immediate consequence of the Feller property (lim t→0 P (t)f = f for bounded continuous f ).
To find a sequence (s α n ): given an α, take < min(α, 1 − α) then it follows from Lemma 4.  s i (T ) P i,j (t) and (4.5.1) holds it is clear that the only possible limits for the FDDs correspond to limit points of the function h t .

Proof of vague convergence
It is clear from the analysis in §3.5 that all we need show is that for all t ≥ 0. This follows from Lemma 4.5.1 and the proof of Lemma 4.5.2 To conclude the proof of Theorem 4.1.1we just apply Lemma 2.3.1.

Concluding remarks
The study of the (space-time) Martin boundary and its relation to questions of conditioning in a general setting seems to have been initiated by Breyer [5].In this paper we've been able to establish results about weak-convergence of conditioned processes solely by studying the spatial Martin boundary, which is somewhat more "accessible".It may be of relevance that the two processes studied have no space-time harmonic functions.It might be possible to construct an example of an evanescent chain with multiple limit points which are space-time harmonic (which might then give examples of more complex limit behaviour than that in Jacka and Roberts [11]).
The authors speculate that the sequence P T should be tight in the Martin topology without the condition that Q is bounded.
We have not explored the issue of the existence of quasistationary distributions in relation to our examples; it seems fairly clear, however, that these do not exist.
We think of the behaviour we have described above as corresponding to different behaviours of the chain on an infinite set of different timescales, and believe that this situation is present in more general classes of Markov chains, however our method of proof, which requires detailed knowledge of the eigenvalues of the chain's Q-matrix, means that we need to make very strong assumptions as to the form of Q in the second example.It would be nice to obtain substantially weaker conditions on Q which ensure the behaviour described in §4.

.3. 4 ) 3 . 3 . 2 Lemma 3 . 3 . 3
Lemma For n > i, β n (i) > 0 and for each i and j βn(i) βn(j) n−→∞ −→ 1. Proof: The result follows from (3.3.2) and (3.3.4)In a moment we shall need the following lemma: Suppose that (a n ) n≥0 and (b n ) n≥0 are two sequences with a n > 0 and b n > 0 for sufficiently large n and that an bn n−→∞ −→ 1; then if (λ n ) n≥0 is a strictly decreasing positive sequence with n≥0 e −λnt a n and n≥0 e −λnt b n convergent for t > 0, we have that n≥0 e −λnt a n / n≥0 e −λnt b n t−→∞ −→ 1.
5.1 and (3.4.3)To conclude the proof of Theorem 3.1.1we simply apply Lemma 2.3.1.

Lemma 4 . 4 . 3
The collection L of weak limits of the ν t is given by (4.3.1).

Lemma 4 . 5 . 2 2 4. 6
The limit points of the FDDs are precisely the corresponding FDDs of the P α s Proof: We may prove that P α t s are consistent just as in Lemma 3.4.1.Sinceh t (i) = s i (t) s 0 (t) = κ i,• dν t and κ i,• is bounded and continuous (in M) it follows from Lemma 4.4.3 that the set of possible pointwise limits ofh t (•) is {ακ(•, 0) + (1 − α)κ(•, 1) : α ∈ [0, 1]}, whichcorresponds precisely to the P α s.The fact that the corresponding FDDs are in fact limit points follows from the form of the Martin topology as in the proof of Lemma 3.4.Identifying the limit points of the conditioned laws We may now complete the proof of Theorem 4.1.1.
Suppose that (h n ) n≥0 is a sequence of superharmonic functions on E, with h n (i 0 ) = 1 for all n and Ē = E. Then a necessary and sufficient condition for pointwise convergence of the (h n ) is the weak convergence (in the Martin topology) of the representing measures ν hn .Suppose h n → h.Since E is compact by assumption, the ν hn are tight and so, given any subsequence ν hn k , there is a further subsequence ν hn k l which converges weakly to a limit ν.Thus h = E κ(•, j)dν(j), and it follows from the uniqueness of the representation that ν is independent of the subsequence.Weak convergence of the ν hn then follows from the subsequential characterisation of weak convergence Lemma 2.2.1Proof:Notice that h n ∈ S for all n, so the ν hn exist.
. Limit points of the representing measure We propose to show now that: 4.1 that, for large n, ν t 2n (2Z + ) ≥ 1 − and ν t 2n+1 (2Z + ) ≤ so, since ν t (2Z + ) is continuous in t (this follows from the open and closed sets characterisations of weak convergence and by virtue of the fact that 2Z + is both open and closed in M), it follows from the intermediate value theorem that there exists s n ∈ (t 2n , t 2n+1 ) with ν sn (2Z + ) = α 4.5 Limit laws for the finite dimensional distributions We shall show first: So far we have used 0 as the reference state for the normalised Green's function κ and for the corresponding representing measures ν t .Irreducibility means that we could have used any state l.The corresponding representing measures ν (ξ) ∝ Γ l,ξ (δ 0 P 0,ξ (t) + δ 1 P 1,ξ (t)).(4.5.2)It follows from this and the fact that Γ l,• is bounded and bounded away from zero that (just as in the proof of Lemma 4.4.1)ν t