On Variance Conditions for Markov Chain CLTs 1. Introduction. by

Central limit theorems for Markov chains are considered, and in particular the relationships between various expressions for asymptotic variance known from the literature. These turn out to be equal under fairly general conditions, although not always. We also investigate the existence of CLTs, and pose some open problems.


Introduction
The existence of central limit theorems (CLTs) for Markov chains is well studied, and is is particularly important for Markov chain Monte Carlo (MCMC) algorithms, see e.g.[13], [23], [8], [5], [6], [10], [12], and [9].In particular, the asymptotic variance σ 2 is very important in applications, and various alternate expressions for it are available in terms of limits, autocovariances, and spectral theory.This paper considers three such expressions, denoted A, B, and C, which are known to "usually" equal σ 2 .These expressions arise in different applications in different ways.For example, it is proved by Kipnis and Varadhan [13] that if C < ∞, then a √ n-CLT exists for h, with σ 2 = C.In a different direction, it is proved by Roberts [17] that Metropolis algorithms satisfying a certain condition must have A = ∞.Such disparate results indicate the importance of sorting out the relationships between A, B, C, σ 2 , and the existence of Markov chain CLTs.In Sections 3 and 6 below, we consider the relationships between the quantities A, B, and C. In Section 4, we consider conditions under which the existence of a √ n-CLT does or does not 1

SUPPORTED BY THE SWEDISH RESEARCH COUNCIL AND BY THE G ÖRAN GUSTAFSSON FOUNDATION FOR RESEARCH IN THE NATURAL SCIENCES AND MEDICINE.
2 SUPPORTED IN PART BY THE NSERC OF CANADA.
imply the finiteness of these quantities.And, in Section 5, we present a number of questions that appear to be open.

Notation and Background
Let {X n } be a stationary, time homogeneous Markov chain on the measurable space (X , F), with transition kernel P , reversible with respect to the probability measure π(•), so P[X n ∈ S] = π(S) for all n ∈ N and S ∈ F. Let P n (x, S) = P[X n ∈ S | X 0 = x] be the nstep transitions.Say that P is ergodic if it is φ-irreducible and aperiodic, from which it follows (cf.[23], [21], [19]) that lim n→∞ sup S∈F |P n (x, S) − π(S)| = 0 for π-a.e.x ∈ X .Write π(g) = X g(x) π(dx), and (P g)(x) = X g(y) P (x, dy), and f, g = X f (x) g(x) π(dx).By reversibility, f, P g = P f, g .Let h : X → R be a fixed, measurable function, with π(h i=1 h(X i ) converges weakly to Normal(0, σ 2 ) for some σ 2 < ∞: where we allow for the degenerate case σ 2 = 0 corresponding to a point mass at 0.
Remark 1. Below we shall generally assume that the Markov chain is ergodic.However, the convergence (1) does not necessarily require ergodicity; see e.g.Proposition 29 of [19].
The assumption of stationarity is not crucial.For example, it follows from [14] that for Harris recurrent chains, if a CLT holds when started in stationarity, then it holds from all initial distributions.
Remark 3. We note that (1), and the results below, are all specific to the n −1/2 normalisation and the Normal limiting distribution.Other normalisations and limiting distributions may sometimes hold, but we do not consider them here.
We shall also require spectral measures.Let E be the spectral decomposition measure (e.g.[22], Theorem 12.23) associated with P , so that for all bounded analytic functions f : [−1, 1] → R, and ]) = I is the identity operator.(Of course, here f (P ) is defined in terms of power series, so that e.g.sin(P ) = ∞ j=0 (−1) j P 2j+1 .)Let E h be the induced spectral measure for h (cf.[5], p. 1753), viz.
There are a number of possible formulae in the literature (e.g.[13], [8], [5]) for the limiting variance σ 2 in (1), including: We consider A, B, and C below.Of course, if π(h 2 ) < ∞, then expanding the square gives We shall also have occasion to consider versions of A and B where the limit is taken over odd integers only: Obviously, A ′ = A and B ′ = B provided the limits in A and B exist.But it may be possible that, say, A ′ is well-defined even though A is not.

Relationships Between Variance Expressions
The following result is implicit in some earlier works (e.g.[13], [8], [5]), though it may not have previously been written down precisely.
Theorem 4. If P is reversible and ergodic, and π(h 2 ) < ∞, then A = B = C (though they may all be infinite).
Theorem 4 is proved in Section 6.We first note that if ergodicity is not assumed, then we may have A = B: Example 5. Let X = {−1, 1}, with π{−1} = π{1} = 1/2, and P (1, {−1}) = P (−1, {1}) = 1, so P is reversible with respect to π(•).Let h be the identity function.Then However, B is an oscillating sum and thus undefined.So A = B, but Theorem 4 is not violated since the chain is periodic and hence not ergodic.And, a (degenerate) √ n-CLT does hold, with σ 2 = A = 0. Now, Kipnis and Varadhan [13] proved for reversible chains that if C < ∞, then a √ n-CLT exists for h, with σ 2 = C. Combining this with Theorem 4, we have: Corollary 6.If P is reversible and ergodic, and π(h 2 ) < ∞, and any one of A, B, and (Furthermore, it is easily seen [20] that C < ∞ whenever π(h 2 ) < ∞ and the spectrum of P is bounded away from 1.) In a different direction, Roberts [17] considered the quantity r(x the probability of remaining at x, which is usually positive for Metropolis-Hastings algorithms.He proved that if Remark 8.If the Markov chain is not reversible, then the spectral measure required to define C becomes much more complicated, and we do not pursue that here.However, it is still possible to compare A and B. It follows immediately from the definitions and the dominated convergence theorem (cf.[3], p. 172; [5]) ergodic chains (see [3]), and for reversible geometrically ergodic chains (since that implies [18] that |γ k | ≤ ρ k π(h 2 ) for some ρ < 1), but it does not hold in general.For more about geometric ergodicity and CLTs see e.g.[10], [19], [12], and [9].
4 Converse: What Does a CLT Imply about Variance?
The result from [13] raises the question of the converse.Suppose {X n } is a stationary Markov chain, and n −1 n i=1 h(X i ) converges weakly to Normal(0, σ 2 ) for some σ 2 < ∞.Does it necessarily follow that any of A, B, and C are finite?An affirmative answer to this question would, for example, allow a strengthening of Corollary 7 to conclude that no √ n-CLT holds for such h, and in particular a √ n-CLT does not hold for the independence sampler examples considered by Roberts [17].Even in the i.i.d.case (where P (x, S) = π(S) for all x ∈ X and S ∈ F), this question is non-trivial.However, classical results (cf.Sections IX.8 and XVII.5 of Feller [7]; for related results see e.g.[4], [2]) provide an affirmative answer in this case: Theorem 9.If {X i } are i.i.d., and n −1/2 n i=1 h(X i ) converges weakly to Normal(0, σ 2 ), where 0 < σ < ∞ and π(h) = 0, then A, B, and C are all finite, and Then since the {Y i } are i.i.d. with mean 0, Theorem 1a on p. 313 of [7] says that there are positive sequences {a n } with a −1 n (Y 1 + . . .+ Y n ) ⇒ Normal(0, 1) if and only if lim z→∞ U (sz)/U (z) = 1 for all s > 0. Furthermore, equation (8.12) on p. 314 of [7] (see also equation (5.23) on p. 579 of [7]) says that in this case, Now, the hypotheses imply that a −1 n (Y 1 + . . .+ Y n ) ⇒ Normal(0, 1) where a n = c n 1/2 with c = σ.Thus, from (2), we have lim n→∞ c −2 U (cn and the (classical) CLT applies.On the other hand, there are many distributions for the {Y i } which have infinite variance, but for which the corresponding U is still slowly varying in this sense.Examples include the density function |y| −3 1 |y|≥1 , and the cumulative distribution function (1 − (1 + y) −2 )1 y≥0 .The results from [7] say that we cannot have a n = c n 1/2 in such cases.(In the y −3 1 |y|≥1 example, we instead have a n = c (n log n) 1/2 .) If {X n } is not assumed to be i.i.d., then the question becomes more complicated.Of course, if {n −1 n i=1 h(X i ) 2 } ∞ n=1 is uniformly integrable, then whenever a √ n-CLT exists we must have A = σ 2 , which implies by Theorem 4 (assuming reversibility) that σ 2 = A = B = C < ∞.However, it is not clear when this uniform integrability condition will be satisfied, and we now turn to some counter-examples.
If {X n } is not even reversible, then it is possible for a √ n-CLT to hold even though A is not finite: Example 11.Let the state space X be the integers, and let h be the identity function.
Consider the Markov chain on X with transition probabilities given (for j ≥ 1) by P (0, 0) = 1/2, P (j, −j) = P (−j, 0) = 1, and P (0, j) = c/j 3 where c n −s the Riemann zeta function.(That is, whenever the chain leaves 0, it cycles to some positive integer j, then to −j, and then back to 0.) This Markov chain is irreducible and aperiodic, with stationary distribution given by π(0) = 1/2 and π(j) = π(−j) = c ′ /j 3 where c ′ = ζ(3)/4.Furthermore, π(h) = 0. Since h(j) + h(−j) = 0 and h(0) = 0, it is easy to see that for n ≥ 2, we have and since by stationarity i=1 h(X i ) converges in distribution to 0, i.e. to N (0, 0).It also follows that for n ≥ 2, Remark 12.If we wish, we can modify Example 11 to achieve convergence to N (0, 1) instead of N (0, 0), as follows.Replace the state space X by X × {−1, 1}, let the first coordinate {X n } evolve as before, let the second coordinate {Y n } evolve independently of {X n } such that each Y n is i.i.d.equal to −1 or 1 with probability 1/2 each, and redefine h as h(x, y) = x + y.Then n −1/2 n i=1 h(X i ) will converge in distribution to N (0, 1).Remark 13.If we wish, we can modify Example 11 to make the functional h bounded, as follows.Instead of jumping from 0 to a value j, the chain instead jumps from 0 to a deterministic path of length 2j + 1, where the first j states have h = +1, the next j states have h = −1, and then the chain jumps back to 0.
We now show that even if {X n } is reversible, the existence of a √ n-CLT does not necessarily imply that A is finite: Example 14.We again let the state space X be the integers, with h the identity function.Consider the Markov chain on X with transition probabilities given by P (0, 0) = 0, P (0, y) = c |y| −4 for y = 0 (where c = 45/π 4 ), and, for x = 0, That is, the chain jumps from 0 to a random site x, and then oscillates between −x and x for a geometric amount of time with mean |x|, before returning to 0. This chain is irreducible and aperiodic, and is reversible with respect to the stationary distribution given by π(x) = c ′ |x| −3 and π(0 We prove the existence of a CLT by regeneration analysis (see e.g.[1]).We define each visit to the state 0 as a regeneration time, and write these regeneration times as T 1 , T 2 , . ... For convenience, we set T 0 = 0 (even though we usually will not have X 0 = 0, i.e. we do not impose a regeneration at time 0, unlike e.g.[15], [10]).These times break up the Markov chain into a collection of random paths ("tours"), of the form {(X Tj +1 , X Tj +2 , . . ., X Tj+1 )} ∞ j=0 .The tours from T 1 onwards each travel from 0 to 0, are all i.i.d.We note that for j ≥ 1, the sum over a single tour, This implies that Tj+1 i=Tj +1 X i has finite variance, say V .It then follows from the classical central limit theorem that as J → ∞, J −1/2 TJ+1 i=T1+1 X i converges in distribution to N (0, V ).By the Law of Large Numbers, as J → ∞, T J /J → τ where τ = E[T j+1 − T j ].Hence, asymptotically for large n, if we find J with √ n-CLT exists for h, with mean 0 and variance We now claim that Var[ n i=0 X i ] is infinite for n even.Indeed, in the special case n = 0, Assume now that n ≥ 2. Let S n = n i=0 X n , and let D n be the event that X i = 0 for some 0 ≤ i ≤ n (so that 0 < P(D n ) < 1).Since n is even, we have that S n = X 0 on the event D C n (because of cancellation), and so Hence, Var[S n ] = E[S 2 n ] = ∞.(In fact, Cov[X m , X k ] = +∞ for k − m even, and −∞ for k − m odd, but we do not use that here.)This proves the claim that Var[ n i=0 X i ] is infinite for all even n.In particular, the limit in the definition of A is either infinite or undefined, so A is certainly not finite.
Remark 15.We now show that in Example 14, in fact A is undefined rather than infinite.(In particular, it is not true that A = C, thus showing that the condition π(h 2 ) < ∞ cannot be dropped from Theorem 4 above.)To prove this, it suffices to show that Var(n −1/2 S n ) remains bounded over all odd n (so that the limit defining A oscillates between bounded and infinite values).But for n odd, we have that S n = 0 on D C n (again because of cancellation), so the claim will follow from showing that n −1 Var[S n 1 Dn ] remains bounded for all odd n.Let {T j } be the regeneration times as above, so X Tj = 0, and let Y j = X Tj +1 .Then v ≡ Var(Y j ) = y =0 cy −4 y 2 < ∞.Now, on D n , the sequence (X T1 , . . ., X n ) breaks up into some number 1 ≤ m ≤ n−1 2 of complete tours [say, (X T1 , . . ., X T2−1 ), (X T2 , . . ., X T3−1 ), . .., (X Tm , . . ., X Tm+1−1 )], plus one possibly-incomplete final tour [say, (X Tm+1 , . . ., X n )].Now, for j = 1, . . ., m + 1, we have that X Tj + . . .+ X min{Tj+1−1,n} equals either Y j or 0, and it follows 6.In a different direction, if the chain is reversible and a √ n-CLT exists, and π(h 2 ) < ∞, does this imply that A < ∞? (Of course, Example 14 has π(h 2 ) = ∞.Also, Example 11 is not reversible, although as discussed in Remark 13 it is possible to make h bounded and thus π(h 2 ) < ∞ in that case.) 7. Related to this, can the condition π(h 2 ) < ∞ be dropped from Corollaries 6 and 7? (It cannot be dropped from Theorem 4, on account of Remark 15.) 6 Proof of Theorem 4 In this section, we prove Theorem 4. We assume throughout that P is a reversible and ergodic Markov chain, and that π(h 2 ) < ∞.We begin with a lemma (somewhat similar to Theorem 3.1 of [8]).