Geometric Ergodicity and Perfect Simulation

This note extends the work of Foss and Tweedie (1997), who showed that availability of the classic Coupling from The Past algorithm of Propp and Wilson (1996) is essentially equivalent to uniform ergodicity for a Markov chain (see also HobertRobert, 2004). In this note we show that all geometrically ergodic chains possess dominated Coupling from The Past algorithms (not necessarily practical!) which are rather closely connected to Foster-Lyapunov criteria.


Introduction
Throughout this paper X will denote an aperiodic Harris-recurrent Markov chain on a measurable state space X which is a Polish space (the Polish condition is required in order to ensure existence of regular conditional probabilities). Recall that X is said to be geometrically ergodic if it converges in total variation and at geometric rate to statistical equilibrium , with multiplicative constant depending on the starting point: for some function V :X ! [ 1;1 )and some rate 2 (0;1). The chain X is said to be uniformly ergodic if the function V can be chosen to be constant. We also recall the notion of a small set:

Definition 1 A subset C
X is a small set (of order k) for the Markov chain X if there is a minorization condition: for 2 (0;1), and probability measure , Results are often stated in terms of the more general notion of petite sets; however for -irreducible aperiodic chains the two notions are equivalent (Meyn and Tweedie 1993, Theorem 5.5.7). Foss and Tweedie (1998) use small set theory to show that the condition of uniform ergodicity for such X is equivalent to the existence of a Coupling from the Past algorithm in the sense of Propp and Wilson (1996). This classic CFTP algorithm delivers a perfect sample from the equilibrium distribution of X . The key to the Foss and Tweedie argument is to remark that in case of uniform ergodicity the entire state space is small. Sub-sampling the process X if necessary (to reduce the order of the small set to 1), one can then devise a classic CFTP algorithm which is actually of the form introduced by Murdoch and Green (1998) as the multigamma coupler. Hobert and Robert (2004) develop the Foss and Tweedie argument to produce approximations to deal with burn-in (time till approximate equilibrium) in the geometrically ergodic case.
The Foss and Tweedie result might be thought to delimit and constrain the possible range of applicability of CFTP. However it is also possible to sample perfectly from the equilibrium of some strictly geometrically ergodic chains using a generalization: namely dominated CFTP (domCFTP) as introduced in Kendall (1998), Kendall and Møller (2000), Cai and Kendall (2002). In this note we show that this is generic: geometric ergodicity implies the existence of a special form of domCFTP algorithm adapted to the geometric ergodicity in question. Recent expositions of quantitative convergence rate estimation depend heavily on small sets and their relatives (see for example Rosenthal 2002), so this piece of CFTP theory connects to quantitative convergence theory in a rather satisfying way.
To describe this special form of domCFTP, we must first introduce the notion of a Foster-Lyapunov condition. Geometric ergodicity for our X is equivalent to a geometric Foster-Lyapunov condition involving recurrence on small sets (this can be extracted from Meyn and Tweedie 1993, Theorem 16.0.1): for some 2 (0;1)and b > 0, some small set C , and a function :X ! [ 1;1 ) which is bounded on C . Note that + b 1 is required, as is j C c 1 , since we impose 1. Now Condition (3) implies that every sub-level set fx 2 X : (x) cg is small (as indeed do weaker conditions; Meyn and Tweedie 1993, Theorem 14.2.3). This is a key fact for our argument so we sketch a coupling proof.
First note that without loss of generality we can employ sub-sampling to ensure that the small set C in Condition (3) is of order 1. Super-martingale arguments show that we can choose n such that P [ X hits C before n jX 0 = x]can be bounded away from zero uniformly in x for (x) c. Let the hitting probability lower bound be 0 . We can use the Minorization Condition (2) to realize X as a split-chain in the sense of Nummelin (1978), regenerating with probability whenever X 2 C . Couple chains from different starting points according to the time when X first regenerates in C , yielding a family of realizations X x of the Markov chain, with X x 0 = x, such that with positive probability 0 all realizations fX x : (x) cg coalesce into a set of at most n trajectories by time n (divided according to the time of first regeneration). Now apply a renewaltheoretic argument to the subsequent regenerations of this finite set of trajectories, which are allowed to evolve independently, except that whenever two trajectories regenerate at the same time they are forced to coalesce. Straightforward analysis shows that we can choose m such that with positive probability 1 < 0 all trajectories starting from fx 2 X : (x) cg have coalesced to just one trajectory by time n + m . Hence fx 2 X : (x) cg is a small set of order n + m , with minorization probability 1 . It is convenient to isolate the notion of a scale function such as in Equation (3).
Definition 2 A (Foster-Lyapunov) scale function for a Markov chain state space X is a measurable function :X ! [ 1;1 ) such that sub-level sets fx 2 X : (x) g are small for all 1.
Now we can define the special form of domCFTP which we require, which is adapted to a specified Foster-Lyapunov scale function.

Definition 3
Suppose that is a scale function for an Harris-recurrent Markov chain X . We say the stationary ergodic random process Y on [ 1;1 )is a dominating process for X based on the scale function (with threshold h and coalescence probability ") if it is coupled co-adaptively to realizations of X x; t (the Markov chain X begun at x at time t) as follows: (a) for all x 2 X , n > 0, and t 0, almost surely (c) and finally, P [ Y n h]must be positive.
Suppose Y is a dominating process for X based on the scale . The following domCFTP algorithm then yields a draw from the equilibrium distribution of X .

Algorithm 4
Simulate Y backwards in equilibrium till the most recent T < 0 for which Y T h; while coalescence does not occur at time T : extend Y backwards till the most recent S < T for which Y S h; set T S; simulate the coupled X forwards from time T + 1, starting with the unique state produced by the coalescence event at time T ; return X 0 as a perfect draw from equilibrium.
Practical implementation considerations are: (1) can one draw from the equilibrium of Y ? (2) can one simulate Y backwards in equilibrium? (3) can one couple the dominated target processes X x; t with Y so as to ensure the possibility of regeneration? (4) can one determine when this regeneration has occurred? and, of course, (5) will the algorithm not run too slowly?
The simplest kind of ordinary small-set CFTP, as in Murdoch and Green (1998), is recovered from this Algorithm by taking Y h, and requiring the whole state-space to be small. In actual constructions, care must be taken to ensure that Y dominates a coupled collection of X for which coalescence is possible as specified in Definition 3(b) (see the treatment of CFTP for Harris chains in Corcoran and Tweedie 2001).
The proof that this algorithm returns a perfect draw from the equilibrium distribution of X is an easy variation on the usual domCFTP argument, found at varying levels of generality in Kendall 1998;Kendall and Møller 2000;Cai and Kendall 2002. The key is to observe that Algorithm 4 reconstructs a coalesced trajectory which may be viewed as produced by the Markov chain begun at time 1 at some specified state x such that (x) h: the proof is then an exercise in making this heuristic precise.
The Foss and Tweedie (1998) argument, and the fact that the geometric Foster-Lyapunov condition (3) would certainly produce a dominating process if the expectation inequality was replaced by a stochastic domination, suggests our main result, which will be proved in Section 2: Theorem 5 If X is a geometrically ergodic Markov chain, and is a scale function for X which is derived from some geometric Foster-Lyapunov condition, then there exists a domCFTP algorithm for X (possible subject to sub-sampling) using a dominating process based on the scale , as in Algorithm 4.
As in the case of the Foss and Tweedie (1998) result, this algorithm need not be at all practical!

Geometric ergodicity implies domCFTP
We begin with a lemma concerning the effect of sub-sampling on the geometric Foster-Lyapunov condition.
Lemma 6 Suppose X satisfies a geometric Foster-Lyapunov condition: for some < 1, some scale function , and small set C = fx 2 X : (x) cg.
We are able to choose b 0 , c0 , b 0 0 , c 0 0 not to depend on c because we have allowed generous sub-sampling (i.e.: k-sub-sampling to change to k 1 ). Proof: Iterating Equation (5),

Proof (of Theorem 5):
We first construct the dominating process. Consider Markov's inequality applied to the geometric Foster-Lyapunov inequality (3). Any dominating process Y must satisfy the stochastic domination (4) described in Definition 3. Consequently, in default of further distributional information about P [ (X n+ 1 )j X n = x] , if Y is to be a dominating process based on the scale then we need Y to be stationary ergodic but also to satisfy Consequently Y is a possible candidate for a dominating process based on the scale if If we define U by Y = (c+ b= )exp(U )(so U is a log-dominating process) then U is the system workload of a D =M =1 queue, sampled at arrivals, with arrivals every l og(1= ) units of time, and service times being independent and of unit Exponential distribution. The process U is a random walk with reflection (of Skorokhod type) at 0: as its jump distribution is Exponential(1) l og(1= )we may deduce it is positive-recurrent if and only if < e 1 .
In case e 1 < < 1, U and Y = (c + b= )exp(U ) fail to be positiverecurrent. However the same construction will work if we use Equation (6) of Lemma 6 to justify sub-sampling X with a sampling period k large enough to ensure a geometric Foster-Lyapunov condition (3) using as scale but with replaced by k 1 < e 1 , and amending bto b 0 , cto c 0 as in Inequality (6).
Thus without loss of generality we may assume < e 1 , and so this Y can be run in statistical equilibrium, and thus qualifies as least partly as a dominating process for the purposes of Theorem 5. In the sequel we assume moreover that further sub-sampling has been carried out based on Equation (7), to ensure that the following small set is of order 1: Here the level h c+ b= is fixed so as to ensure h = c 0 0 + b 0 0 =(1 )with b 0 0 , c 0 0 given as in Equation (7); thus h supplies a stable threshold for geometric Foster-Lyapunov conditions, even allowing for further sub-sampling if required. Note in particular that Y = (c+ b= )exp(U )is able to sink below h, since h c+ b= and the system workload U can reach zero.
To fulfil the requirements on a dominating process given in Definition 3, we need to construct a coupling between Y and the target process X expressed in terms of a random flow of independent maps F t+ n+ 1 :X ! X : satisfying the distributional requirement that X x; t should evolve as the Markov chain X , the domination requirement expressed by the implication (4), and also the regeneration requirement that with probability " the set fF n (u) : such that (u) hg should be a singleton set. The well-known link between stochastic domination and coupling can be applied together with the arguments preceding Equation (9) to show that we can couple the various X x; t with Y co-adaptively in this manner so that the implication (4) holds: note that here and here alone we use the Polish space nature of X , which allows us to complete the couplings by constructing regular conditional probability distributions for the various X x; t conditioned on the (X x; t ). Thus all that is required is to show that this stochastic domination coupling can be modified to allow for regeneration. The small set condition for fx 2 X : (x) hg means there is a probability measure and a scalar 2 (0;1) such that for all Borel sets B [ 1;1 ), whenever (x) h, Moreover the stochastic domination which has been arranged in the course of defining Y means that for all real u, whenever (x) y, We can couple in order to arrange for regeneration if we can identify a probability measure e , defined solely in terms of and the dominating jump distribution P [ Y u jY = y] , such that for all real u P [ (X n+ 1 )> u jX n = x] ((u;1 )) P [ Y > u jY = y] e ((u;1 )) ((u;1 )) e ((u;1 )) and moreover P [ Y n+ 1 2 B jY n = y] e (B ): For then at each step we may determine whether or not regeneration has occurred (with probability ); under regeneration we use stochastic domination to couple to e ; otherwise we use stochastic domination to couple the residuals.
We state and prove this as an interior lemma, as it may be of wider interest.
Then there is a probability measure stochastically dominating and such that is minorized by L (V ). Moreover depends only on and L (V ).

Proof (of Lemma 7):
Subtract the measure ((u;1 ))from both sides of Inequality (13) representing the stochastic domination L (U ) L (V ). By the minorization condition (14) the resulting left-hand-side is nonnegtive. Thus for all real u is a nonnegative measure (because of the minorization condition (14)). Consequently P [ U > u] ((u;1 ))must be non-increasing in u and so we may reduce the right-hand side by minimizing over w u: where is the potentially signed measure defined by In fact is a probability measure on [ 1;1 ). Both (f1g)= (f1g)and ([ 1;1 ))= 1 follow from considering u = 1, u ! 1 . Now we show is nonnegative: If the first supremum were to be attained at w u then the two suprema would cancel. If the first supremum were to be attained at w 0 2 [ u;u + u 0 ]then ((u;u + u and hence So we can deduce is in fact a nonnegative measure. On the other hand This concludes the proof of Theorem 5: use Lemma 7 to couple L (X n+ 1 jX n = x) to L (Y n+ 1 jY n = y)whenever (x) y in a way which implements stochastic domination and ensures all the X n+ 1 regenerate simultaneously whenever Y h.
Note that the algorithm requires us to be able to draw from the equilibrium distribution of Y and to simulate its time-reversed equilibrium dual. Up to an additive constant l og(Y )is the workload of a D =M =1queue. This queue is amenable to exact calculations, so these simulation tasks are easy to implement (specializing the theory of the G =M =1 queue as discussed in Grimmett and Stirzaker 1992, ch. 11). However in general we do not expect this "universal dominating process" to lead to practical domCFTP algorithms! The difficulty in application will arise in determining whether or not regeneration has occurred as in Algorithm 4. This will be difficult especially if sub-sampling has been applied, since then one will need detailed knowledge of convolutions of the probability kernel for X (potentially a harder problem than sampling from equilibrium!).
Of course, in practice one uses different dominating processes better adapted to the problem at hand. For example an M =D =1 queue serves as a good logdominating process for perpetuity-type problems and gives very rapid domCFTP algorithms indeed, especially when combined with other perfect simulation ideas such as multishift CFTP (Wilson 2000b), read-once CFTP (Wilson 2000a), or one-shot coupling (Roberts and Rosenthal 2002).
Finally note that, in cases when 2 [ e 1 ;1) or when the small set fx 2 X : (x) hg is of order greater than 1, we are forced to work with coupling constructions that are effectively non-co-adapted (sub-sampling means that target transitions X m k to X m k+ 1 depend on sequences Y m k , Y m k+ 1 , . . . , Y m k+ k ). The potential improvements gained by working with non-adapted couplings are already known not only to theory (the non-co-adapted filling couplings of Griffeath 1975;Goldstein 1979; and the efficiency considerations of Burdzy and Kendall 2000) but also to practitioners (Huber 2004 with the property Leb(S i \ (u;v))> 0 for all 0 < u < v < 1 , all i2 f1;2;::: g.

Proof:
Enumerate the rational numbers in [ 0;1)by 0 =q 0 ,q 1 ,q 2 , . . . . Choose < 1=2, and define q n + k;q n + k + 2 n : Then for each k 1 Leb (A 0 \ [ k;k + 1)) 2 : Continue by defining a sequence of nested subsets A r A r 1 by q n + k 2 r ;q n + k 2 r + 4 r 2 n ; Thus the measurable shell B r = A r nA r+ 1 places mass of at least 2 4 r in each interval [ k 2 r ; k+ 1 2 r ). It follows that if S is defined by then Leb(S \ U ) > 0 for every open set U [ 1;1 ). The desired disjoint sequence S 1 , S 2 , . . . is obtained by considering a countably infinite family of disjoint increasing subsequences of the natural numbers.

Lemma 9
There is a Markov chain X satisfying a Foster-Lyapunov condition with scale function , such that any dominating process Y based on will fail to be positive-recurrent.
We define the transition kernel p(x; )of X on [ 1;1 )as follows: p(x;d y) = exp( (y 1))d y for y 1; so that if X n 2 C then X n+ 1 1 has a unit rate Exponential distribution. Then: C is a small set for X of order 1(in fact it will be a regenerative atom!); if X n 2 C then E [ X n+ 1 ]= 2; if X has positive chance of visiting state 1 then the whole state space [ 1;1 )will be maximally Leb-irreducible.
For x > 1 and x 2 S i , set Note that, because we are using the identity scale (x) x, Thus X satisfies a geometric Foster-Lyapunov condition based on scale and small set C , and so is geometrically ergodic. Suppose Y is a dominating process for X based on the identity scale . This means it must be possible to couple Y and X such that, if (X n ) = X n Y n then (X n+ 1 )= X n+ 1 Y n+ 1 . This can be achieved if and only if P [ X n+ 1 z jX n = u] P [ Y n+ 1 z jY n = x] for all z 1, and Lebesgue-almost all u < x. Therefore we require of such Y that P [ Y n+ 1 xy jY n = x] ess sup u< x fP [ X n+ 1 xy jX n = u] g = sup i ess sup q i : 1 < u < x;u 2 S i ;q i u > xy = sup i q i : q i > y = 1 y ; using Markov's inequality, then the construction of the kernel of X , then the measure-density of the S i . So such a Markov chain Y must also (at least when above level 1 ) dominate exp(Z ), where Z is a random walk with jump distribution Exponential(1)+ l og( ). Hence it will fail to be positive-recurrent on the small set C when e 1 .
There may exist some subtle re-ordering to provide domCFTP for such a chain on a different scale; however the above lemma shows that domCFTP must fail for dominating processes for X based on the scale .

Conclusion
We have shown that geometric ergodicity (more strictly, a geometric Foster--Lyapunov condition) implies the existence of a special kind of domCFTP algorithm. The algorithm is not expected to be practical: however it connects perfect simulation firmly with more theoretical convergence results in the spirit of the Foss and Tweedie (1998) equivalence between classic CFTP and uniform ergodicity. Note also that the "universal dominating process", the sub-critical exp(D =M =1)so derived, is itself geometrically ergodic.
It is natural to ask whether other kinds of ergodicity (for example, polynomial ergodicity) can also be related to perfect simulation in this way; this is now being pursued by Stephen Connor as part of his PhD research at Warwick.