Averaging principle for two time-scale regime-switching processes

This work studies the averaging principle for a fully coupled two time-scale system, whose slow process is a diffusion process and fast process is a purely jumping process on an infinitely countable state space. The ergodicity of the fast process has important impact on the limit system and the averaging principle. We showed that under strongly ergodic condition, the limit system admits a unique solution, and the slow process converges in the L1-norm to the limit system. However, under certain weaker ergodicity condition, the limit system admits a solution, but not necessarily unique, and the slow process can be proved to converge weakly to a solution of the limit system.


Introduction
We study in this work a fully coupled two time-scale stochastic system (X ε,α t , Y ε,α t ) in R d × S, where S = {1, 2, . . ., N} with N ≤ ∞.The slow process (X ε,α t ) is described as a solution to the following stochastic differential equation (SDE): and the fast process (Y ε,α t ) is a jumping-process on S satisfying for δ > 0, i, j ∈ S, x ∈ R d , and ε, α are small positive parameters.In the existing literatures, the system (X ε,α t , Y ε,α t ) is called fully coupled if the diffusion coefficient σ of slow process (X ε,α t ) depends on the fast process (Y ε,α t ) and the transition rates (q ij (x)) i,j∈S of the fast process (Y ε,α t ) depends on (X ε,α t ) as well.Multi-scale systems arise in many research fields such as in biology systems [9,20,21,24,32], in mathematical finance [10,11], etc. Correspondingly, there are many works devoted to the study of averaging principle, central limit theorems, and large deviations of these stochastic models.For a two time-scale system where both slow and fast components are continuous processes given as solutions of SDEs, these problems have been extensively studied, such as, in [1,23,24,25,27,28,33,38,39], in [16] for SDEs driven by fractional Brownian motions.The interaction between the fast component and the slow one makes a fully coupled two time-scale system much complicated, which has been revealed in the works [26,33,38,39].
The averaging principle says that the slow process (X ε,α t ) will converge to some limit process ( Xt ) as ε, α → 0. When the fast process (Y ε,α t ) does not depend on (X ε,α t ), usually called an uncoupled system, the averaging principle often holds in quite general conditions.However, when (Y ε,α t ) depends on (X ε,α t ) and particularly (Y ε,α t ) does not locate in a compact space, it becomes more difficult to establish the averaging principle.In this work we focus on addressing the impact on the limit behavior of (X ε,α t , Y ε,α t ) caused by: 1) various ergodicities of the fast process on the wellposedness of the limit process ( Xt ); 2) when the state space S is infinitely countable, the dependence on the fixed state of the slow process (X ε,α t ) of the invariant measure of (Y ε,α t ).
Let us review some known works in the setting similar to ours.In the situation that (Y ε,α t ) is a continuous time Markov chain independent of the slow process (X ε,α t ), Eizenberg and Freidlin [8], Freidlin and Lee [13] investigated separately the limit behavior of solutions of PDE systems with Dirichlet boundary associated with (X ε,α t , Y ε,α t ) t≥0 when the diffusion coefficient of X ε,α t does not depend or depends on Y ε,α t .These two works reveal that whether the diffusion coefficient of X ε,α t depends on Y ε,α t or not has important impact on the method to study the limit behavior of (X ε,α t , Y ε,α t ).To provide a decisive estimate on the difference between (X ε,α t ) and its limit process, a large deviation principle (LDP) was established in [17,18].
For a setting where the fast process (Y ε,α t ) is a jumping process depending on the slow process (X ε,α t ) as well, the averaging principle and LDP have been studied by Faggionato, Gabrielli, and Crivellari [9] and Budhiraja, Dupuis and Ganguly [4].[9] considered a simple case without diffusion term for the slow component by the nonlinear semigroup method developed by Feng and Kurtz [12].Whereas, [4] considered a fully coupled case by using the weak convergence method, and established a process level large deviation principle.
All the aforementioned works, no matter whether the fast jumping process (Y ε,α t ) depends on (X ε,α t ) or not, considered only the situation that the state space S of (Y ε,α t ) is a finite state space, which is hence compact.However, the infinite countability of the state space S of (Y ε,α t ) has important impact on the averaging principle and LDP of (X ε,α t , Y ε,α t ).For example, in a simple setting α ≡ 1, Bezuidenhout [2] studied the LDP of certain functionals of (X ε,α t , Y ε,α t ) with the diffusion coefficient of (X ε,α t ) independent of (Y ε,α t ).It showed that the LDP holds when (Y ε,α t ) is in a finite state space.Furthermore, it was shown by a counterexample that when (Y ε,α t ) is a Markov chain in an infinite state space, the LDP may fail.Meanwhile, as our studied system (X ε,α t , Y ε,α t ) is fully coupled, the invariant probability measure π x = (π x i ) i∈S of (Y ε,α t ) will depend on the position x of the slow process (X ε,α t ).The infinite countability of S makes the regularity of x → π x become much more complicated than the case that S is finite.The regularity of x → π x has important impact on the characterization of the limit system.Precisely, suppose (q ij (x)) i,j∈S is a conservative, irreducible transition rate matrix for every x ∈ R d , which is Lipschitz continuous in x in certain matrix norm.Let P(S) be the space of all probability measures over S endowed with the total variation norm.Let π x ∈ P(S) be the invariant probability measure associated with (q ij (x)) i,j∈S provided it exists.Then, when S is a finite state space, x → π x as a function from R d to P(S) is Lipschitz continuous.This result has been proved in [9] and [4] in different ways.[9] proved it by the Perron-Frobenius theorem to express π x in terms of a nonzero right eigenvector of (q ij (x)) i,j∈S corresponding to the eigenvalue 0. In [4], it is proved through expressing π x as a polynomial of transition probabilities according to Freidlin and Wentzell [14].Nevertheless, these two methods are infeasible when S is infinite.Moreover, when S is infinitely countable, x → π x could be not Lipschitz continuous and even not Hölder continuous of any exponent in (0, 1); see our Example 2.1 below.
To establish the averaging principle when S is infinitely countable, our main challenge is to study the regularity of x → π x from R d to P(S).To overcome this difficulty, the ergodic property of P x t plays a crucial role, where P x t denotes the semigroup associated with the Markov chain with transition rate matrix (q ij (x)) i,j∈S .We shall show that x → π x is Lipschitz continuous if P x t is strongly ergodic uniformly w.r.t.x based on an integration by parts formula for continuous time Markov chains.If supposing only that P x t is ergodic and x is shown to be 1/2-Hölder continuous.To prove this assertion, we develop a coupling method for parameter-dependent Markov chains based on Skorokhod's representation theorem for jumping processes.Consequently, under the strongly ergodic condition, the equation to characterize the limit process ( Xt ) admits a unique solution, and we can show that (X ε,α t ) converges in L 1 -norm to ( Xt ) as ε, α → 0. However, under ergodic condition, (X ε,α t ) converges weakly to its limit process provided that the limit system is unique.The ratio ε/α as ε, α → 0 has no impact on the averaging principle.Nevertheless, the large deviation principle of (X ε,α t , Y ε,α t ) will be shown to depend heavily on the ratio ε/α in our another work.
The remainder of this work is organized as follows.In Section 2, we state the main results of this work including: the regularity of x → π x under two different ergodicity conditions, and the averaging principle for (X ε,α t , Y ε,α t ) t≥0 in respectively strong and weak convergence sense.Section 3 is devoted to developing the coupling method for parameterdependent Markov chains, which is not only the basis to study the regularity of x → π x under the weak ergodicity condition of (Y ε,α t ), but also plays an important role to decouple the close interaction between (X ε,α t ) and (Y ε,α t ) to establish the averaging principle.The arguments of main results are all presented in Section 4.

Statement of main results
This section is devoted to establishing the averaging principle for (X ε,α t , Y ε,α t ) t≥0 as ε, α go to zero.Let us begin with introducing three fundamental conditions on the stochastic system (X ε,α t , Y ε,α t ), which will be used throughout this work.
(A1) There exist constants (A2) For each x ∈ R d , (q ij (x)) i,j∈S is a conservative, irreducible transition rate matrix.Assume κ := sup i∈S j∈S,j =i sup x∈R d q ij (x) < ∞.
(A3) There exists a constant K 3 > 0 such that Under these conditions (A1)-(A3), the two time-scale system (1.1), (1.2) admit a unique strong solution to any initial value X ε,α 0 = x 0 ∈ R d and Y ε,α 0 = i 0 ∈ S; see, e.g.[41] or [34] under certain more general non-Lipschitz conditions.To focus our idea on the impact of the ergodicity of (Y ε,α t ) on the averaging principle, we impose a simple condition (A1) on the slow process (X ε,α t ).We refer the readers to [28] for the technique to generalize (A1) to the local Lipschitz condition.
For the fully coupled two time-scale system (X ε,α t , Y ε,α t ), in contrast to uncoupled two time-scale systems, the regularity of invariant probability measure π x associated with the Q-matrix (q ij (x)) i,j∈S increases the complexity and difficulty of characterizing the limit system ( Xt ) of (X ε,α t , Y ε,α t ) as ε, α → 0. As mentioned in the introduction, when S is a finite state space, and (q ij (x)) i,j∈S is Lipschitz continuous in x, then its associated invariant probability measure π x = (π x i ) i∈S is also Lipschitz continuous in x, which has been proved in [4,9].However, when S is infinitely countable, this becomes uncertain.Note that the invariant probability measure is also a left eigenvector to the Q-matrix.The perturbation on linear generators can cause significant changes on its corresponding eigenvalues and eigenvectors.To see the complexity of this problem, one can refer to the fruitful researches on the perturbation theory of linear operators; see, for instance, the monograph [22] and references therein.
Let us recall some notations on the ergodicity of Markov chains (cf.[7,30]).Let P t denote a semigroup associated with a continuous time Markov chain on the state space S. Suppose that there exists an invariant probability measure π = (π i ) i∈S .The total variation distance between P t (i, •) and π is defined by where µ(f ) := i∈S µ i f (i) for any probability measure µ on S. The Markov chain is called ergodic if lim the process is called exponentially ergodic, if the process is called strongly ergodic or uniformly ergodic, if It is known that if the chain is strongly ergodic, its convergence rate must be of exponential type, i.e. sup i∈S for some constants C, λ > 0; see, for example, [29,Lemma 4.1].Consequently, it is easy to see that ergodic Markov chain on a finite state space must be strongly ergodic.Accordingly, we first generalize the results in [4,9] for Markov chains on a finite state space to the setting on an infinite state space under the strongly ergodic condition.
Let P x t be the semigroup associated with the Q-matrix (q ij (x)) i,j∈S , and π x its associated invariant probability measure provided it exists throughout this work.
(A4) Suppose that P x t is strongly ergodic uniformly in x, that is, there exist constants Proposition 2.1 (Strongly ergodic case) Assume (A2), (A3) and (A4) hold.Then, the functional where To make the presentation transparent, we defer the argument to Section 4. It is useful to mention the works [7,29] and references therein, which provide various sufficient conditions for strong ergodicity of continuous-time Markov chains and diffusion processes.
We proceed to investigate the regularity of x → π x under certain ergodic condition weaker than strong ergodicity.Unfortunately, under weaker ergodic condition and without the uniformity w.r.t.x, the Lipschitz continuity of x → π x in the total variation norm may fail.To illustrate it, we construct an example as follows.
Example 2.1 For each x ∈ (0, 1), let (Y x t ) t≥0 be a birth-death process on S = {1, 2, . ..} with birth rate It is clear that q ij (x) is Lipschitz continuous in x for all i, j ∈ S.Then, (i) for each x ∈ (0, 1), the birth-death Markov chain (Y x t ) t≥0 is exponentially ergodic, but not strongly ergodic, satisfying for some positive constants C i (x) depending on i ∈ S and x ∈ (0, 1).
(ii) Its invariant probability measure π x = (π x i ) i≥1 is given by and for any β ∈ (0, 1] sup The argument of assertions stated in Example 2.1 is also deferred to Section 4.

Now, let us consider the following ergodic condition weaker than strong ergodicity condition (A4).
(A5) Assume that there exist a positive function θ : S → (0, ∞), a decreasing function Proposition 2.2 Assume the conditions (A2), (A3) and (A5) hold, then x → π x is 1/2-Hölder continuous, i.e. where This proposition is proved based on an intricate construction of coupling process of ( Y x t ) and ( Y y t ) with Q-matrix (q ij (x)) i,j∈S and (q ij (y)) i,j∈S respectively in terms of Skorokhod's representation for jumping processes, which is presented in Section 3. Our construction method in current work improves the one used in [36] to study the stability of regime-switching processes under the perturbation of Q-matrix and in [37] to study the continuous dependence of intial values for stochastic functional differential equations with state-dependent regime-swtiching.The key point is the estimate of Next, we go to establish the averaging principle for (X and the limit system of (X ε,α t , Y ε,α t ) will be given as the solution to the ordinary differential equation (ODE) d Xt = b( Xt )dt, X0 = x 0 . (2.6) Under conditions (A1) and (A4), by Proposition 2.1, it is easy to see b is Lipschitz continuous, and hence ODE (2.6) admits a unique solution.Under the strongly ergodic condition (A4), we can get L 1 -convergence of X ε,α t to Xt as ε, α → 0.
be the solution to (1.1), (1.2), and ( Xt ) the solution to (2.6).Then However, under (A1) and (A5), by Proposition 2.2, b can be shown only to be Hölder continuous just as π x .In this situation, thanks to Peano's theorem, ODE (2.6) admits a solution, but may loss the uniqueness.Consequently, under the weaker ergodic condition (A5) the limit system ( Xt ) becomes more complicated, and (X ε,α t ) can be shown to converge weakly to its limit whenever ODE admits a unique solution.The precise result is given in the following theorem.

Construction of the coupling processes
In this part we introduce the coupling processes used in the study of regularity of x → π x and in decoupling the interaction between the slow process (X ε,α t ) and the fast process (Y ε,α t ) in order to establish the averaging principle.This part deals with the technical difficulties caused by the full dependence between (X ε,α t ) and (Y ε,α t ).In the spirit of Skorokhod, we express a state-dependent jumping process over S in terms of an integral w.r.t. a Poisson random measure.In order to deal with the case S being infinitely countable, we modify the construction method of intervals used in Skorokhod's representation theorem, which is quite different to the extensively used one (cf.e.g.[15,34,41]).
Consider the solutions (X x t , Y x t ) and ( X y t , Y y t ) respectively to the following SDEs: and d for δ > 0.
Lemma 3.1 (Key lemma) Suppose that (A1), (A2) hold and f, g satisfy (A1) replacing b and σ respectively.For every x, y ∈ R d , x = y and every i 0 ∈ S, there is a coupling process (X x t , Y x t ) t≥0 and ( X y t , Y y t ) t≥0 satisfying SDEs (3.1) and (3.2) respectively such that where As an application of Lemma 3.1, consider a special case: b = f = 0, σ = g = 0, then X x t ≡ x, X y t ≡ y, and we obtain that: Corollary 3.2 Under (A2), for every x, y ∈ R d , there is a coupling process (Y x t , Y y t ) associated respectively with the Q-matrix (q ij (x)) i,j∈S and (q ij (y)) i,j∈S such that Argument of Lemma 3.1 We need first construct suitable intervals related to the transition rate matrix (q ij (x)) i,j∈S so as to express the jumping processes (Y x t ) and ( Y y t ) in terms of a common Poisson random measure.The proof is divided into two steps.
Step 1.The first step is to construct a sequence of intervals associated with the transition rate matrix (q ij (z)), z ∈ R d .Our construction method is applicable when S is finite or infinite, and is more suitable to cope with the case S is infinite than the construction method used in [15,34,41].
Precisely, let γ n = sup k =n sup z∈R d q nk (z) for n ∈ S, and by (A2) we get and where κ is given in (A2).For notation convenience, we put Γ ii (z) = ∅ and Γ ij (z where m(dx) denotes the Lebesgue measure over R.
Secondly, we provide an explicit construction of the Poisson random measure as in [19], which helps us to illustrate the calculation below.Let ξ i , k, i ≥ 1, be non-negative random variables satisfying P(τ i } i,k≥1 are all mutually independent.Put Let where ∆p(s) = p(s) − p(s−).Correspondingly, put As a consequence, we get a Poisson point process (p(t)) and a Poisson random measure N p (dt, dx) with intensity dtm(dx).
The desired coupling process is defined as the solutions to the following SDEs. Then, where A∆B := (A\B) ∪ (B\A) for Borel sets A, B in R. Note that for s ≤ τ s , where Therefore, due to the mutual independence of N p (dt, dz) and (W (t)), and the construction of Γ ij (z), we have (3.9) Now, let us consider P(Y x 2δ = Y y 2δ ).It is clear that (3.10) Due to (3.7) and (3.8), where τ δ 1 , τ δ 2 denote the first and second jump of (p(t)) after time δ.Note also that given F δ , for s ∈ [δ, τ δ 1 ], X x s and X y s depend only on (W r ) r∈[δ,s) .Based on the mutual independence of (W t ) and (p(t)), and their independent increment property, we get from (3.11) that Inserting the estimates (3.9) and (3.12) into (3.10),we obtain that Deducing inductively, we get Denote N(t) = t δ , the integer part of t/δ, t k = kδ for k ≤ N(t) and t N (t)+1 = t for t > 0. It follows from (3.14) that Letting δ ↓ 0, as δ(N(t) + 1) → t, this yields that The proof of Lemma 3.1 is complete.

Arguments of the main results
This section is devoted to the arguments of the results presented in Section 2.
We begin with proving the regularity of x → π x under strongly ergodic condition, which is based on the integration by parts formula for continuous-time Markov chains.The application of total variation norm and taking supremum in the initial value i over S play an important role in the argument.

.1)
For any h : S → R with |h| ≤ 1 and any 0 ≤ s ≤ t, where, due to the conditions (A2) and (A3), the operator norm Combining this estimate with (A4), we get from (4.1) that and further by the arbitrariness of h in (4.2).
For any h : S → R with |h| ≤ 1, it holds (4.4) By (A4), it holds Inserting (4.3), (4.5) into (4.4),we get Letting t → ∞ and taking supremum over h with |h| ≤ 1, we obtain that which is the desired conclusion, and the proof of Proposition 2.1 is completed.
As a direct application of Proposition 2.1, it follows from the Lipschitz continuity of b that b is also Lipschitz continuous.In fact, Then, and According to the ergodic criterion for birth-death processes (cf.[7, Chapter 1]), the birthdeath process (Y x t ) t≥0 is ergodic for every x ∈ (0, 1).Its invariant probability measure π x is given by π and hence (Y x t ) t≥0 is exponentially ergodic.However, by virtue of [29, Theorem 3.1], the birth-death process (Y x t ) t≥0 is not strongly ergodic since For the birth-death process (Y x t ) t≥0 , its rate of exponential ergodicity is equivalent to the exponential L 2 -convergence rate; see, [5,Theorem 5.3].Exponential L 2 -convergence of Markov processes are closely related to the extensively studied Poincaré inequality and spectral gap of infinitesimal generators.There are many works devoted to the estimates of exponential L 2 -convergence rate.Applying [5,Example 5.7], the exponential convergence rate of (Y x t ) t≥0 is given by x ∈ (0, 1).(4.7) At last, we shall show that sup which yields (2.3) and x → π x is not Hölder continuous with any exponent β ∈ (0, 1).
Indeed, we only need to consider the case x > y in (4.8).Due to the expression of Therefore, when n > x for m ≥ 2, then 1 > x > y > 0 and n x = m.For any β ∈ (0, 1], due to (4.9), sup All assertions in Example 2.1 have been proved.
Before presenting the proofs of Theorem 2.3 and Theorem 2.4, let us introduce the main challenge in the proofs.Firstly, we should pay more attention to the difficulty caused by the full dependence of the two time-scale system (X ε,α t , Y ε,α t ).To overcome this difficulty, we shall use the coupling method developed in Section 3. Secondly, we need to pay attention to the essential difference between the distributions of (X ε,α t ) t∈[0,T ] and those of (Y ε,α t ) t∈[0,T ] for ε, α ∈ (0, 1) given T > 0. Precisely, for each fixed T > 0, let C([0, T ]; R d ) be the space of continuous paths from [0, T ] to R d , and D([0, T ]; S) the Skorokhod space containing right continuous paths with left limits.Then under condition (A1), the distributions of  for some λ, µ > 0.Then, for each T > 0 the collection of distributions of (Λ α t ) t∈[0,T ] for α ∈ (0, 1) is not tight.
Argument of Theorem 2.3 Let (X ε,α t , Y ε,α t ) be a solution to SDEs (1.1), (1.2).Based on Skorokhod's representation theorem, similar to SDE (3.7), (X ε,α t , Y ε,α t ) can be expressed as a solution to SDEs driven by a Brownian motion and a Poisson random measure respectively.In the following, this expression of (X ε,α t , Y ε,α t ) helps us to use the method introduced in Section 3 to construct the desired coupling process so as to decouple the interaction between (Y ε,α t ) and (X ε,α t ).
Due to the boundedness of b and σ in (A1), it follows from (1.1) that Using the triangle inequality, we divide the estimate of E|X ε,α t − Xt | into five terms: We shall estimate the right hand side of (4.11) terms by terms.By (A1) and To deal with term (III), we divide the integral over [0, t(δ)) into the integrals over subintervals [kδ, (k + 1)δ) via the following inequality t ) t≥kδ constructed as in Lemma 3.1 such that: t ) t≥kδ is a Markov chain in S with transition rate matrix ( 1 α q ij (X ε,α kδ )) i,j∈S and satisfies (ii) The following estimate holds: for t > kδ, Noting the scaling 1/α in the transition rate matrix of ( Y for any bounded function f on S, where P x t denotes the semigroup corresponding to the Q-matrix (q ij (x)) i,j∈S as before.By (A4), for any h ∈ B(S) with |h| ≤ 1, s > kδ, Hence, where in the second inequality we used (4.13) and (A3), and in the third inequality we used (4.14).Therefore, Taking δ = α 3/4 and invoking (4.12), we obtain that (4.17) Consequently, inserting above estimates (4.18), (4.17) into (4.11) by taking δ = α 3/4 , we obtain that where ).
Applying Gronwall's inequality, we finally get φ(ε, α)e )t = 0, and the proof of this theorem is complete.
Argument of Proposition 2.2 For any bounded function h on S with |h| ≤ 1, take some i 0 ∈ S, and then it holds that where we have used Lemma 3.1 and (A4), which ensures that ∞ 0 η s ds < ∞.Then, by taking t = , we arrive at By the arbitrariness of h and (A3), and further the desired estimate (2.4) by taking the infimum for θ(i 0 ) over i 0 ∈ S.
Analogous to the deduction of (4.6), under conditions (A1)-(A3) and (A5), b is 1/2-Hölder continuous by virtue of Proposition 2.2.According to Peano's theorem, ODE (2.6) must admit a solution.However, it may loss the uniqueness of solution.Moreover, in contrast to the L 1 -convergence in Theorem 2.3 in the strongly ergodic condition, we can only prove the weak convergence of (X ε,α t ) to ( Xt ).Proof of Theorem 2.4 Denote by L ε,α the generator of (X ε,α t ) given by Here, for a matrix A, A * denotes its transpose and tr(A) its trace.Let T > 0 be fixed.Let C([0, T ]; R d ) be endowed with uniform norm, i.e.
Due to the boundedness of b and σ in (A1), it is standard to show where x 0 = X ε,α 0 , C(T, x 0 , p) is a constant depending on T, x 0 and p.By Itô's formula, for 0 for some constant C > 0. Combing this with X ε,α 0 = x 0 , the collection of laws L X ε,α for ε, α > 0 over the space C([0, T ]; R d ) is tight by virtue of [3,Theorem 12.3].As a consequence, there is a subsequence {L X ε ′ ,α ′ ; ε ′ , α ′ > 0} and a limit law L X in C([0, T ]; R d ) such that L X ε ′ ,α ′ converges weakly to L X as ε ′ , α ′ → 0. According to Skorokhod's representation theorem with a slight abuse of notation, we may assume that ( In order to characterize the limit, we shall show that for any f ∈ C 2 c (R d ), the space of functions with compact support and continuous second order derivatives.
This means that ( X t ) is a solution to ODE (2.6).
To this end, it suffices to show that for any 0 ≤ s < t ≤ T , for any bounded F s measurable function Φ,

.24)
As a solution to SDE (1.1), ( By the dominated convergence theorem, it is clear that Hence, to derive (4.24) from (4.25) we only need to show lim According to the expression (4.21), (4.23) of L ε ′ ,α ′ , L and the boundedness of σ, it suffices to show lim Similar to the treatment of (4.11), we shall use the time discretization method and the coupling method to show (4.26).
is not tight, which can be seen from the following simple and meaningful example given in[40, Example 7.3, p.172].