Merging for time inhomogeneous finite Markov chains

We develop singular value techniques in the context of time inhomogeneous finite Markov chains with the goal of obtaining quantitative results concerning the asymptotic behavior of such chains. We introduce the notion of c-stability which can be viewed as a generalization of the case when a time inhomogeneous chain admits an invariant measure. We describe a number of examples where these techniques yield quantitative results concerning the merging of the distributions of the time inhomogeneous chain started at two arbitrary points.


Introduction
The quantitative study of time inhomogeneous Markov chains is a very broad and challenging task.Time inhomogeneity introduces so much flexibility that a great variety of complex behaviors may occur.For instance, in terms of ergodic properties, time inhomogeneity allows for the construction of Markov chains that very efficiently and exactly attain a target distribution in finite time.An example is the classical algorithm for picking a permutation at random.Thinking of a deck of n cards, one way to describe this algorithm is as follows.At step i mod n, pick a card uniformly at random among the bottom n − i + 1 cards and insert it in position i.After n − 1 steps, the deck is distributed according to the uniform distribution.However, it is not possible to recognize this fact by inspecting the properties of the individual steps.Indeed, changing the order of the steps destroys the neat convergence result mentioned above.
In this article, we are interested in studying the the ergodic properties of a time inhomogeneous chain through the individual ergodic properties of the one step Markov kernels.The works [4; 23; 25] consider similar problems.To illustrate what we have in mind, consider the following.Given a sequence of irreducible Markov kernels (K i ) ∞ 1 on a finite set V , let K n i be the usual iterated kernel of the chain driven by K i alone, and let K 0,n (x, •) be the distribution of the chain (X t ) ∞ 1 driven by the sequence (K i ) ∞  1 with X 0 = x.Let π i be the invariant probability measure of the kernel K i .Suppose we understand well the convergence K n i (x, •) → π i (•) ∀x and that this convergence is, in some sense, uniform over i.For instance, assume that there exists β ∈ (0, 1) and T > 0 such that, for all i and n ≥ T + m, m > 0 max x, y We would like to apply (1) to deduce results concerning the proximity of the measures K 0,n (x, •), K 0,n ( y, •), x, y ∈ V.
These are the distributions at time n for the chain started at two distinct points x, y.To give a precise version of the types of questions we would like to consider, we present the following open problem.
The very strong conclusion above is known only in the case when V is a group, the kernels K i are invariant, and T ≥ α −1 log V .See [23,Theorem 4.9].
In view of the difficulties mentioned above, the purpose of this paper (and of the companion papers [25; 26]) is to develop techniques that apply to some instances of Problem 1.1 and some of its variants.Namely, we show how to adapt tools that have been successfully applied to time homogeneous chains to the study of time inhomogeneous chains and provide a variety of examples where these tools apply.The most successful techniques in the quantitative study of (time homogeneous) finite Markov chains include: coupling, strong stationary time, spectral methods, and functional inequalities such as Nash or log-Sobolev inequalities.This article focuses on spectral methods, more precisely, singular values methods.The companion paper [25] develops Nash and log-Sobolev inequalities techniques.Two papers that are close in spirit to the present work are [4; 11].In particular, the techniques developed in [4] are closely related to those we develop here and in [25].
We point out that the singular values and functional inequalities techniques discussed here and in [4; 25] have the advantage of leading to results in distances such as ℓ 2 -distance (i.e., chi-square) and relative-sup norm which are stronger than total variation.
The material in this paper is organized as follows.Section 2 introduces our basic notation and the concept of merging (in total variation and relative-sup distances).See Definitions 2.1, 2.8, 2.11.Section 3 shows how singular value decompositions can be used, theoretically, to obtain merging bounds.The main result is Theorem 3.2.An application to time inhomogeneous constant rate birth and death chains is presented.Section 4 introduces the fundamental concept of stability (Definition 4.1), a relaxation of the very restrictive hypothesis used in [23] that the kernels driving the time inhomogeneous chain under investigation all share the same invariant distribution.If the stability hypothesis is satisfied then the singular value analysis becomes much easier to apply in practice.See Theorems 4.10 and 4.11.Section 4.2 offers our first example of stability concerning end-point perturbations of simple random walk on a stick.A general class of birth and death examples where stability holds is studied in Section 5. Further examples of stability are described in [25; 26].The final section, Section 6, gives a complete analysis of time inhomogeneous chains on the two-point space.We characterize total variation merging and study stability and relative-sup merging in this simple but fundamental case.
We end this introduction with some brief comments regarding the coupling and strong stationary time techniques.Since, typically, time inhomogeneous Markov chains do not converge to a fixed distribution, adapting the technique of strong stationary time poses immediate difficulties.This comment seems to apply also to the recent technique of evolving sets [17], which is somewhat related to strong stationary times.In addition, effective constructions of strong stationary times are usually not very robust and this is likely to pose further difficulties.An example of a strong stationary time argument for a time inhomogeneous chain that admits a stationary measure can be found in [18].
Concerning coupling, as far as theory goes, there is absolutely no difficulties in adapting the coupling technique to time inhomogeneous Markov chains.Indeed, the usual coupling inequality holds true (with the exact same proof) if T is the coupling time of a coupling (X n , Y n ) adapted to the sequence (K i ) ∞ 1 with starting points X 0 = x and Y 0 = y.See [11] for practical results in this direction and related techniques.Coupling is certainly useful in the context of time inhomogeneous chains but we would like to point out that time inhomogeneity introduces very serious difficulties in the construction and analysis of couplings for specific examples.This seems related to the lack of robustness of the coupling technique.For instance, in many coupling constructions, it is important that past progress toward coupling is not destroyed at a later stage, yet, the necessary adaptation to the changing steps of a time inhomogeneous chain makes this difficult to achieve.

Different notions of merging
Let V be a finite set equipped with a sequence of kernels The distribution µ n of X n is determined by the initial distribution µ 0 by where K n,m (x, y) is defined inductively for each n and each m > n by  The first frame shows two particular initial distributions, one of which is the binomial.The other frames show the evolution under a time inhomogeneous chain driven by a deterministic sequence involving two kernels from the set N (Q, ε) of Section 5.2, a set consisting of perturbations of the Ehrenfest chain kernel.In the fourth frame, the distributions have merged.The last two frames illustrate the evolution after merging and the absence of a limiting distribution.Here N = 30 and the total number of points is n = 61.with K n,n = I (the identity).If we view the K n 's as matrices then this definition means that K n,m = K n+1 • • • K m .In the case of time homogeneous chains where all K i = Q are equal, we write K 0,n = Q n .
Our main interest is understanding mixing type properties of time inhomogeneous Markov chains.However, in general, µ n = µ 0 K 0,n does not converge toward a limiting distribution.Instead, the natural notion to consider is that of merging defined below.For a discussion of this property and its variants, see, e.g., [3; 16; 19; 27].Definition 2.1 (Total variation merging).Fix a sequence (K i ) ∞ 1 of Markov kernels on a finite set V .We say the sequence A rather trivial example that illustrates merging versus mixing is as follows.
Example 2.2.Fix two probability distributions Then for any sequence 1 is merging then, for any two starting distributions µ 0 , ν 0 , the measures µ n = µ 0 K 0,n and Our goal is to develop quantitative results for time inhomogeneous chains in the spirit of the work concerning homogeneous chains of Aldous, Diaconis and others who obtain precise estimates on the mixing time of ergodic chains that depend on size of the state space in an explicit way.In these works, convergence to stationary is measured in terms of various distances between measures µ, ν such as the total variation distance the chi-square distance w.r.t.ν and the relative-sup distance µ ν − 1 ∞ .See, e.g., [1; 5; 6; 12; 21].Given an irreducible aperiodic chain with kernel K on a finite set V , there exists a unique probability measure π > 0 such that K n (x, •) → π(•) as n → ∞, for all x.This qualitative property can be stated equivalently using total variation, the chi-square distance or relative-sup distance.However, if we do not assume irreducibility, it is possible that there exists a unique probability measure π (with perhaps π( y) = 0 for some y) such that, for all x, K n (x, •) → π(•) as n tends to infinity (this happens when there is a unique absorbing class with no periodicity).In such a case, K n (x, •) does converge to π in total variation but the chi-square and relative-sup distances are not well defined (or are equal to +∞).This observation has consequences in the study of time inhomogeneous Markov chains.Since there seems to be no simple natural property that would replace irreducibility in the time inhomogeneous context, one must regard total variation merging and other notions of merging as truly different properties.Definition 2.4 (Relative-sup merging).Fix a sequence of (K i ) ∞ 1 of Markov kernels.We say the sequence is merging in relative-sup distance if The techniques discussed in this paper are mostly related to the notion of merging in relative-sup distance.A graphic illustration of merging is given in Figure 1.
Remark 2.5.On the two-point space, consider the reversible irreducible aperiodic kernels Then the sequence K 1 , K 2 , K 1 , K 2 , . . . is merging in total variation but is not merging in relative-sup distance.See Section 6 for details.
When focusing on the relation between ergodic properties of individual kernels K i and the behavior of an associated time inhomogeneous chain, it is intuitive to look at the K i as a set instead of a sequence.The following definition introduces a notion of merging for sets of kernels.
Definition 2.6.Let be a set of Markov kernels on a finite state space V .We say that is merging in total variation if, for any sequence We say that is merging in relative-sup if, for any sequence One of the goals of this work is to describe some non-trivial examples of merging families = {Q 1 , Q 2 } where Q 1 and Q 2 have distinct invariant measures.

Example 2.7. Many examples (with all
sharing the same invariant distribution) are given in [23], with quantitative bounds.For instance, let V = G be a finite group and S 1 , S 2 be two symmetric generating sets.Assume that the identity element e belongs to S 1 ∩ S 2 .Assume further that max{#S 1 , #S 2 } = N and that any element of G is the product of at most D elements of S i , i = 1, 2. In other words, the Cayley graphs of G associated with S 1 and S 2 both have diameter at most D.

Merging time
In the quantitative theory of ergodic time homogeneous Markov chains, the notion of mixing time plays a crucial role.For time inhomogeneous chains, we propose to consider the following corresponding definitions.Definition 2.8.Fix ε ∈ (0, 1).Given a sequence (K i ) ∞ 1 of Markov kernels on a finite set V , we call max total variation ε-merging time the quantity Definition 2.9.Fix ε ∈ (0, 1).We say that a set of Markov kernels on V has max total variation ε-merging time at most T if for any sequence (K i ) ∞ 1 with K i ∈ for all i, we have 7 the total variation ε-merging time for is at most (N D) 2 (log |G| + 2 log 1/ε).
As noted earlier, merging can be defined and measured in ways other than total variation.One very natural and much stronger notion than total variation distance is relative-sup distance used in Definitions 2.4-2.6 and in the definitions below.
1 of Markov kernels on a finite set V , we call relative-sup ε-merging time the quantity Definition 2.12.Fix ε ∈ (0, 1).We say that a set of Markov kernels on V has relative-sup εmerging time at most T if for any sequence 1 is merging in total variation or relative-sup then, for any initial distribution µ 0 the sequence µ n = µ 0 K 0,n must merge with the sequence ν n where ν n is the invariant measure for K 0,n .In total variation, we have In relative-sup, for ε ∈ (0, 1/2), inequality (4) below yields max The following simple example illustrates how the merging time of a family of kernels may differ significantly form the merging time of a particular sequence Example 2.15.Let Q 1 be the birth and death chain kernel on V N = {0, . . ., N } with constant rates p, q, p+q = 1, p > q.This means here that Q 1 (x, x +1) = p, Q 1 (x, x −1) = q when these are well defined and ).The chain driven by this kernel is well understood.In particular, the mixing time is of order N starting from the end where π 1 attains its minimum.
Let Q 2 be the birth and death chain with constant rates q, p.Hence, Q 2 (x, x +1) = q, Q 2 (x, x −1) = p when these are well defined and It is an interesting problem to study the merging property of the set = {Q 1 , Q 2 }.Here, we only make a simple observation concerning the behavior of the sequence The graph structure of this kernel is a circle.As an example, below we give the graph structure for N = 10.Edges are drawn between points x and y if Q(x, y) > 0. Note that Q(x, y) > 0 if and only if Q( y, x) > 0, so that all edges can be traversed in both directions (possibly with different probabilities).
For the Markov chain driven by Q, there is equal probability of going from a point x to any of its neighbors as long as x = 0, N .Using this fact, one can compute the invariant measure π of Q and conclude that max It follows that (q/p) 2 ≤ (N + 1)π(x) ≤ (p/q) 2 .This and the comparison techniques of [8] show that the sequence Compare with the fact that each kernel K i in the sequence has a mixing time of order N .
3 Singular value analysis

Preliminaries
We say that a measure µ is positive if ∀x, µ(x) > 0. Given a positive probability measure µ on V and a Markov kernel then µ ′ is also positive.Obviously, any irreducible kernel K satisfies (1).
Fix p ∈ [1, ∞] and consider K as a linear operator It is important to note, and easy to check, that for any measure µ, the operator K : 1 of Markov kernels satisfying (1).Fix a positive probability measure µ 0 and set µ n = µ 0 K 0,n .Observe that µ n > 0 and set Further, one easily checks the important fact that It follows that we may control the total variation merging of a sequence To control relative-sup merging we note that if The last inequality follows from the fact that if

Singular value decomposition
The following material can be developed over the real or complex numbers with little change.Since our operators are Markov operators, we work over the reals.Let and be (real) Hilbert spaces equipped with inner products 〈•, •〉 and 〈•, •〉 respectively.If u : × → is a bounded bilinear form, by the Riesz representation theorem, there are unique operators A : → and B : → such that u(h, k) = 〈Ah, k〉 = 〈h, Bk〉 . ( If A : → is given and we set u(h, k) = 〈Ah, k〉 then the unique operator B : → satisfying (5) is called the adjoint of A and is denoted as B = A * .The following classical result can be derived from [20, Theorem 1.9.3].

Theorem 3.1 (Singular value decomposition). Let and be two Hilbert spaces of the same dimension, finite or countable. Let A :
→ be a compact operator.There exist orthonormal bases (φ i ) of and (ψ i ) of and non-negative reals The non-negative numbers σ i are called the singular values of A and are equal to the square root of the eigenvalues of the self-adjoint compact operator A * A : → and also of AA * : → .
One important difference between eigenvalues and singular values is that the singular values depend very much on the Hilbert structures carried by , .For instance, a Markov operator K on a finite set V may have singular values larger than 1 when viewed as an operator from ℓ 2 (ν) to ℓ 2 (µ) for arbitrary positive probability measure ν, µ (even with ν = µ).
We now apply the singular value decomposition above to obtain an expression of the ℓ 2 distance between µ ′ = µK and K(x, •) when K is a Markov kernel satisfying (1) and µ a positive probability measure on V .Consider the operator K = K µ : ℓ 2 (µ ′ ) → ℓ 2 (µ) defined by (2).Then the adjoint By Theorem 3.1, there are eigenbases (ϕ i ) ) and ℓ 2 (µ) respectively such that , the square roots of the eigenvalues of K * µ K µ (and K µ K * µ ) given in non-increasing order, i.e.
and ψ 0 = ϕ 0 ≡ 1.From this it follows that, for any x ∈ V , To see this, write With δy = δ y /µ ′ ( y), we have K(x, y)/µ ′ ( y) = K δy (x).Write δy so we get that Using this equality yields the desired result.This leads to the main result of this section.In what follows we often write K for K µ when the context makes it clear that we are considering K as an operator from ℓ 2 (µ ′ ) to ℓ 2 (µ) for some fixed µ.
1 be a sequence of Markov kernels on a finite set V , all satisfying (1).Fix a positive starting measure µ 0 and set µ i = µ 0 K 0,i .For each i = 0, 1, . . ., let and, for all x ∈ V , Moreover, for all x, y ∈ V , Proof.Apply the discussion prior to Theorem 6 with µ = µ 0 , K = K 0,n and be the orthonormal basis of ℓ 2 (µ 0 ) given by Theorem 3.1 and δx = δ x /µ 0 (x).Then δx Furthermore, Theorem 3.3.4and Corollary 3.3.10 in [15] give the inequality Using this with k = |V | − 1 in (6) yields the first claimed inequality.The second inequality then follows from the fact that σ 1 (K 0,n , µ 0 ) ≥ σ j (K 0,n , µ 0 ) for all j = 1 . . .|V | − 1.The last inequality follows from writing and bounding with reversible measure µ i .Hence The difficulty in applying Theorem 3.2 is that it usually requires some control on the sequence of measures µ i .Indeed, assume that each K i is aperiodic irreducible with invariant probability measure π i .One natural way to put quantitative hypotheses on the ergodic behavior of the individual steps (K i , π i ) is to consider the Markov kernel which is the kernel of the operator K * i K i when K i is understood as an operator acting on ℓ 2 (π i ) (note the difficulty of notation coming from the fact that we are using the same notation K i to denote two operators acting on different Hilbert spaces).For instance, let β i be the second largest eigenvalue of ( P i , π i ).Given the extreme similarity between the definitions of P i and P i , one may hope to bound β i using β i .This however requires some control of Indeed, by a simple comparison argument (see, e.g., [7; 9; 21]), we have One concludes that Remark 3.4.The paper [4] studies certain contraction properties of Markov operators.It contains, in a more general context, the observation made above that a Markov operator is always a contraction from ℓ p (µK) to ℓ p (µ) and that, in the case of ℓ 2 spaces, the operator norm K − µ ′ ℓ 2 (µK)→ℓ 2 (µ) is given by the second largest singular value of K µ : ℓ 2 (µK) → ℓ 2 (µ) which is also the square root of the second eigenvalue of the Markov operator P acting on ℓ 2 (µK) where P = K * µ K µ , K * µ : ℓ 2 (µ) → ℓ 2 (µK).This yields a slightly less precise version of the last inequality in Theorem 3.2.Namely, writing and using the contraction property above one gets .
Example 3.5 (Doeblin's condition).Assume that, for each i, there exists α i ∈ (0, 1), and a probability measure π i (which does not have to have full support) such that This is known as a Doeblin type condition.For any positive probability measure µ 0 , the kernel P i defined at (7) is then bounded below by This implies that β 1 (i), the second largest eigenvalue of P i , is bounded by Let us observe that the very classical coupling argument usually employed in relation to Doeblin's condition applies without change in the present context and yields max x, y See [11] for interesting developments in this direction.
Example 3.6.On a finite state space V , consider a sequence of edge sets E i ⊂ V × V .For each i, assume that 1.For all x ∈ V , (x, x) ∈ E i .
2. For all x, y ∈ V , there exist k = k(i, x, y) and a sequence (x j ) k 0 of elements of V such that for some ε > 0. We claim that the sequence This easily follows from Example 3.5 after one remarks that the hypotheses imply To prove this, for any fixed m, let

Application to constant rates birth and death chains
A constant rate birth and death chain Q on V = V N = {0, 1 . . ., N } is determined by parameters p, q, r ∈ [0, 1], p + q + r = 1, and given by For any A ≥ a ≥ 1, let ↑ N (a, A) be the collection of all constant rate birth and death chains on V N with p/q ∈ [a, A].Let ↑ N (a, A) be the set of all probability measures on V N such that aµ(x) ≤ µ(x + 1) ≤ Aµ(x), x ∈ {0, . . ., N − 1}.
For η ∈ (0, 1/2), let N (η) be the set of all Markov kernels K with K(x, x) ∈ (η, Then, for any initial distribution µ 0 ∈ ↑ N (a, A) and any sequence In particular, the relative-sup ε-merging time of the family Remark 3.9.This is an example where one expects the starting point to have a huge influence on the merging time between K 0,n (x, •) and µ n (•).And indeed, the proof given below based on the last two inequalities in Theorem 3.2 shows that the uniform upper bound given above can be drastically improved if one starts from N (or close to N ).This is because, if starting from N , the factor µ 0 (N ) −1 − 1 is bounded above by 1/(a − 1).Using this, the proof below shows approximate merging after a constant number of steps if starting from N .To obtain the uniform upper bound of Theorem 3.8, we will use the complementary fact that µ 0 (0 Proof.To apply Theorem 3.2, we use Remark 3.3 and compute the kernel of P i = K * i K i given by We will use that (P i , µ i ) is reversible with To obtain this, observe first that Then write The fact that µ i (x) ≥ aµ i (x −1) is equivalent to saying that µ i (x) = z i a h i (x) with h i (x +1)−h i (x) ≥ 1, x ∈ {0, N − 1}.In [10, Proposition 6.1], the Metropolis chain M = M i for any such measure µ = µ i , with base simple random walk, is studied.There, it is proved that the second largest eigenvalue of the Metropolis chain is bounded by The Metropolis chain has Hence, (9) and µ i ∈ ↑ (a, A) give Now, a simple comparison argument yields Remark 3.10.The total variation merging of the chains studied above can be obtained by a coupling argument.Indeed, for any staring points x < y, construct the obvious coupling that have the chains move in parallel, except when one is at an end point.The order is preserved and the two copies couple when the lowest chain hits N .A simple argument bounds the upper tail of this hitting time and shows that order N suffices for total variation merging from any two starting states.For this coupling argument (and thus for the total variation merging result) the upper bound p/q ≤ A is irrelevant.

Stability
This section introduces a concept, c-stability, that plays a crucial role in some applications of the singular values techniques used in this paper and in the functional inequalities techniques discussed in [25].
In a sense, this property is a straightforward generalization of the property that a family of kernels share the same invariant measure.We believe that understanding this property is also of independent interest.
where µ n = µ 0 K 0,n .If this holds, we say that (K n ) ∞ 1 is c-stable with respect to the measure µ 0 .

Definition 4.2.
A set of Markov kernels is c-stable with respect to a probability measure µ 0 > 0 if any sequence (K i ) ∞ 1 such that K i ∈ is c-stable with respect to µ 0 > 0.
Remark 4.3.If all K i share the same invariant distribution π then (K i ) ∞ 1 is 1-stable with respect to π.
Remark 4.4.Suppose a set of Markov kernels is c-stable with respect to a measure µ 0 .Let π be the invariant measure for some irreducible aperiodic Q ∈ .Then we must have (consider the sequence Hence, is also c 2 -stable with respect to π and the invariant measures π, π ′ for any two aperiodic irreducible kernels Q, Q ′ ∈ must satisfy Remark 4.5.It is not difficult to find two Markov kernels K 1 , K 2 on a finite state space V that are reversible, irreducible and aperiodic with reversible measures π 1 , π 2 satisfying (1) so that {K 1 , K 2 } is not c-stable.See, e.g., Remark 2.5.This shows that the necessary condition (1) for a set of Markov kernels to be c-stable is not a sufficient condition.
Example 4.6.On V N = {0, . . ., N }, consider two birth and death chains For any choice of the parameters p i , q i with 1 < p 1 /q 1 < p 2 /q 2 there are no constants c such that the family N = {Q N ,1 , Q N ,2 } is c-stable with respect to some measure µ N ,0 , uniformly over N because lim However, the sequence This chain is irreducible and aperiodic and thus has an invariant measure µ 0 .Set µ n = µ 0 K 0,n .Then µ 2n = µ 0 and Theorem 4.7.Let V N = {0, . . ., N }.Let (K i ) ∞ 1 be a sequence of birth and death Markov kernels on V N .Assume that K i (x, x), K i (x, x ± 1) ∈ [1/4, 3/4], x ∈ V n , i = 1, . . ., and that (K i ) ∞ 1 is c-stable with respect to the uniform measure on V N , for some constant c ≥ 1 independent of N .Then there exists a constant A = A(c) (in particular, independent of N ) such that the relative-sup ε-merging time for This will be proved later as a consequence of a more general theorem.The estimate can be improved to T ∞ (ε) ≤ AN 2 (1 + log + 1/ε) with the help of the Nash inequality technique of [25].
We close this section by stating an open question that seems worth studying.
Problem 4.8.Let be a set of irreducible aperiodic Markov kernels on a finite set V .Assume that there is a constant a ≥ 1 such that min{K(x, x) : x ∈ V, K ∈ } ≥ a −1 , and that for any two kernels K, K ′ in the associate invariant measures π, π ′ satisfy a −1 π ≤ π ′ ≤ aπ.Prove or disprove that this implies that is c-stable (ideally, with a constant c depending only on a).
Getting positive results in this direction under strong additional restrictions on the kernels in is of interest.For instance, assume further that the kernels in are all birth and death kernels and that for any two kernels K, K ′ ∈ , a −1 K(x, y) ≤ K ′ (x, y) ≤ aK(x, y).Prove (or disprove) that c-stability holds in this case.
In this general direction, we only have the following (not very practical) simple result.Proposition 4.9.Assume that there is a constant a ≥ 1 such that for any two finite sequences Q 1 , . . ., Q i and Q ′ 1 , . . ., Q ′ j of kernels from the stationary measures π, π ′ of the products Q Then is a 2 -stable with respect to the invariant measure π K of any kernel K ∈ .
1 be a sequence of kernels from .By hypothesis, for any fixed n, if π ′ n denotes the invariant measure of K 0

Singular values and c-stability
Suppose Q has invariant measure π and second largest singular value σ 1 .Then See, e.g., [12] or [23,Theorem 3.3].The following two statements can be viewed as a generalization of this inequality and illustrates the use of c-stability.The first one uses c-stability of the sequence (K i ) ∞ 1 whereas the second assumes the c-stability of a set of kernels.In both results, the crucial point is that the unknown singular values σ(K i , µ i−1 ) are replaced by expressions that depend on singular values that can, in many cases, be estimated.Theorem 4.7 is a simple corollary of the following theorem.
1 be a sequence of irreducible Markov kernels on a finite set V .Assume that (K i ) ∞ 1 is c-stable with respect to a positive probability measure µ 0 .For each i, set µ i 0 = µ 0 K i and let σ 1 (K i , µ 0 ) be the second largest singular value of K i as an operator from ℓ 2 (µ i 0 ) to ℓ 2 (µ 0 ).Then In addition, Proof.First note that since Consider the operator P i of Remark 3.3 and its kernel By assumption where the term in brackets on the right-hand side is the kernel of K * i K i where K i : ℓ 2 (µ i 0 ) → ℓ 2 (µ 0 ).This kernel has second largest eigenvalue σ(K i , µ 0 ) 2 .A simple eigenvalue comparison argument yields ). Together with Theorem 3.2, this gives the stated result.The last inequality in the theorem is simply obtained by replacing µ n by µ 0 using c-stability.Theorem 4.11.Fix c ∈ (1, ∞).Let be a family of irreducible aperiodic Markov kernels on a finite set V .Assume that is c-stable with respect to some positive probability measure µ 0 .Let (K i ) ∞ 1 be a sequence of Markov kernels with K i ∈ for all i.Let π i be the invariant measure of K i .Let σ 1 (K i ) be the second largest singular value of K i as an operator on ℓ 2 (π i ).Then, we have In addition, Proof.Recall that the hypothesis that is c-stable . Consider again the operator P i of Remark 3.3 and its kernel By assumption where the term in brackets on the right-hand side is the kernel of Together with Theorem 3.2, this gives the desired result.
The following result is an immediate corollary of Theorem 4.11.It gives a partial answer to Problem 1.1 stated in the introduction based on the notion of c-stability.
Corollary 4.12.Fix c ∈ (1, ∞) and λ ∈ (0, 1).Let be a family of irreducible aperiodic Markov kernels on a finite set V .Assume that is c-stable with respect to some positive probability measure µ 0 and set µ * 0 = min x {µ 0 (x)}.For any K in , let π K be its invariant measure and σ 1 (K) be the singular value of K on ℓ 2 (π K ).Assume that, ∀ K ∈ , σ 1 (K) ≤ 1 − λ.Then, for any ε > 0, the relative-sup ε-merging time of is bounded above by Remark 4.13.The kernels in Remark 2.5 are two reversible irreducible aperiodic kernels K 1 , K 2 on the 2-point space so that the sequence obtained by alternating K 1 , K 2 is not merging in relative-sup distance.While these two kernels satisfy max{σ 1 (K 1 ), σ 2 (K 2 )} < 1, = {K 1 , K 2 } fails to be c-stable for any c > 0. The family of kernels has relative-sup merging time bounded by 1 but is not c-stable for any c ≥ 1.To see that c-stability fails for , note that we may choose a sequence (K n ) ∞ 1 such that K n = M p n where p n → 0. This shows that the c-stability hypothesis is not a necessary hypothesis for the conclusion to hold for certain .
The following proposition describes a relation of merging in total variation to merging in the relativemax distance under c-stability.Note that we already noticed that without the hypothesis of c-stability the properties listed below are not equivalent.Proposition 4.14.Let V be a state space equipped with a finite family of irreducible Markov kernels.Assume that is c-stable and that either of the following conditions hold for each given K ∈ : (i) K is reversible with respect to some positive probability measure π > 0.
(ii) There exists ε > 0 such that K satisfies K(x, x) > ε for all x.
Then the following properties are equivalent: 1.Each K ∈ is irreducible aperiodic (this is automatic under (ii)).
2. is merging in total variation.

is merging in relative-sup.
Proof.Clearly the third listed property implies the second which implies the first.We simply need to show that the first property implies the third.Let (K n ) ∞ 1 be a sequence with K n ∈ for all n ≥ 1.Let π n be the invariant measure for K n and σ 1 (K n ) be its second largest singular value as an operator on ℓ 2 (π n ).

Either of the conditions (i)-(ii) above implies that σ
the second largest and smallest eigenvalues of K n respectively.It is well-known, for a reversible kernel K n , the fact that K n is irreducible aperiodic is equivalent to If (ii) holds, Lemma 2.5 of [8] tells us that where Kn = 1/2(K n + K * n ).If K n is irreducible aperiodic, so is Kn and it follows that β 1 ( Kn ) < 1. Hence σ 1 (K n ) < 1.Since is a finite family, the σ 1 (K n ) can be bounded uniformly away from 1. Theorem 4.11 now yields that is merging in relative-sup.

End-point perturbations for random walk on a stick
This section provides the first non-trivial example of c-stability.
This example is interesting in that it seems quite difficult to handle by inspection and algebra.Proving c-stability involves keeping track of huge amounts of cancellations, which appears to be rather difficult.We will use an extremely strong coupling property to obtain the result.
For all t ≥ 0, let W 1 t be driven by Q. Set W 1 0 = W 2 0 and construct W 2 t with the property that inductively as follows.
Case 1 Note that in this case Remark 4.17.In case 2, i.e., when p = q, the result stated above can be improved by using the Nash inequality techniques developed in [25].This leads to bounds on the merging times of order N 2 instead N 2 log N .Using singular values, we can get a hint that this is the case by considering the average ℓ 2 distance and the first inequality in Theorem 3.2.Indeed, in the case p = q, comparison with the simple random walk on the stick yields control not only of the top singular value, but of most singular values.Namely, if (K i ) ∞ 1 is a sequence in (p, p, r, ε), p, r ∈ [ε, 1 − 2ε] and µ 0 = ν ≡ 1/(N + 1), µ n = ν K 0,n , then the j-th singular value σ j (K i , µ i−1 ) is bounded by Using this in the first inequality stated in Theorem 3.2 (together with stability) yields

Stability of some inhomogeneous birth and death chains
Recall that a necessary condition for a family of irreducible aperiodic Markov kernels to be c-stable is that, for any pair of kernels in , the associated stationary measures π, π In less precise terms, all the stationary measures must have a similar behavior which we refer to as the stationary measure behavior of the family.
The goal of this section is to provide examples of c-stable families of Markov chains that allow for a great variety of stationary measure behaviors.Because we lack techniques to study c-stability, providing such examples is both important and not immediate.The examples presented in this section are sets of "perturbations" of birth and death chains having a center of symmetry.Except for this symmetry, the stationary measure of the birth and death chain that serves as the basis for the family is arbitrary.Hence, this produces examples with a wide variety of behaviors.

Middle edge perturbation for birth and death chains with symmetry
For N ≥ 1, a general birth a death chain on [−N , N ] is described by with p N = q −N = 0 and 1 = p x + q x + r x for all x ∈ [−N , N ].We consider the case when for all This immediately implies that q 0 = p 0 .The kernel Q has reversible stationary distribution,

Example: the binomial chain
We now illustrate the above construction on a classical example, the (centered) binomial distribution π(x) = 2 −2N 2N N +x on V N = {−N , . . ., N }.The birth and death chain Q given by admits the binomial distribution π as reversible measure.It is obviously symmetric with respect to 0. Its second largest singular value(=eigenvalue) is 1 − 2/(2N + 1).By Theorem 5.4, the set Here, q 0 = N /(2N + 1).Hence we can apply Theorem 4.11 which yields a constant A (independent of N ) such that, for any sequence This is a good example to point out that the present singular value technique is most precise when applied to bound the average ℓ 2 -distance (here µ 0 = π, µ n = πK 0,n which, by c-stability, is comparable to π).Indeed, to bound this quantity, Theorem for n ≥ AN (log N + log 1/η).This indicates merging after order N log N steps whereas the bound (6) requires N 2 steps.Singular values alone do not yield a bound of order N log N for the ℓ 2distance from a fixed starting point or for the relative-sup merging time.In [25], such an improved upper bound is obtained by using a logarithmic Sobolev inequality.Figure 1 in the introduction illustrates the merging in time of order N log N of a time inhomogeneous chain of this type driven by a sequence and δ 2 = − N 4(2N + 1) .

Two-point inhomogeneous Markov chains
This final section examines time inhomogeneous Markov chains on the two-point space.We characterize total variation merging and discuss stability and relative-sup merging.
Proof.This can be shown by induction.Note that K 0 Using the two step formula above for K 0,n+1 = K 0,n K n+1 , we have To see that α 0,n+1 and p 0,n+1 can be written in the forms of ( 3) and ( 4), we use the induction hypothesis along with the equality To study the merging properties of (K i ) ∞ 1 , observe that, for any two initial probability measures µ 0 and ν 0 on V , we have This, together with elementary considerations, yields the following proposition.
When the α i are not all positive it is still possible to show c-stability but the proof is bit subtle.This illustrates in this simple case the intrinsic difficulties related to the notion of stability.In order to prove this proposition, we need the following technical lemma.Proof.(1) The fact that p 0,2 ∈ [η, 1 − η] follows since p 0,2 is a convex combination of p 1 and p 2 and α 0,2 ≥ 0 follows by Lemma 6.1.
Note that either Q j = I or by Lemma 6.11 we have that α(Q j ) ≥ 0 and p(Q j ) ∈ [η, 1 − η].Now we write, For any j ∈ [1, m], consider the kernel M j = K i j Q j .Since

tt
= (n/3) log n t = n log n − 2 = n log n − 1 t = n log n

Figure 1 :
Figure1: Illustration of merging (both in total variation and relative-sup) based on the binomial example studied in Section 5.2.The first frame shows two particular initial distributions, one of which is the binomial.The other frames show the evolution under a time inhomogeneous chain driven by a deterministic sequence involving two kernels from the set N (Q, ε) of Section 5.2, a set consisting of perturbations of the Ehrenfest chain kernel.In the fourth frame, the distributions have merged.The last two frames illustrate the evolution after merging and the absence of a limiting distribution.Here N = 30 and the total number of points is n = 61.
is a strictly increasing sequence of sets until, for some k, it reaches Ω m,k (x) = V .Of course, the integer k is at most |V | − 1.Now, hypothesis(8) implies that K m,m+|V | (x, y) ≥ ε |V |−1 as desired.Of course, this line of reasoning can only yield poor quantitative bounds in general.