Averaging in the case of multiple invariant measures for the fast system

We consider the averaging principle for deterministic or stochastic systems with a fast stochastic component (family of continuous-time Markov chains depending on the state of the system as a parameter). We show that, due to bifurcations in the simplex of invariant probability measures of the chains, the limiting system should be considered on a graph or on an open book with certain gluing conditions in the vertices of the graph (or on the bifurcation surface).


Introduction
Consider the d-dimensional continuous stochastic process z ε t satisfying the equation We assume that v is sufficiently smooth, and ξ ε t = ξ t/ε , where ξ t is a stationary process with sufficiently good mixing properties, such as a non-degenerate diffusion on a compact manifold or a continuous time Markov chain on a finite state space (we consider the latter case in this paper). The Wiener process W t is independent of ξ ε t . The coefficient κ is non-negative.
Putv(z) = Ev(ξ t , z). Then (see, for example, [6] Section 7.2) z ε t →z t , as ε ↓ 0, (convergence, in distribution, of the processes), wherez t is the solution of the equation with the same initial condition as z ε t . The convergence of z ε t toz t is preserved if the process ξ t is not stationary but converges with probability one to a stationary ergodic processξ t . In this case,v(z) = Ev(z,ξ t ). Moreover, the fast component ξ ε t in (1) can depend on the slow component. In order to illustrate this point, let us focus on the case when the fast motion is governed by a continuous time Markov chain Ξ z t on the finite state space {1, ..., n}. The transition rates for the chain Ξ z t , which depends on the parameter z ∈ R d , will be denoted by q ij (z) ≥ 0, 1 ≤ i, j ≤ n, i = j. Intuitively, the slow motion z ε t is governed, at short time scales, by (1) with ξ ε t = ξ t/ε replaced by Ξ z t/ε . Yet, we cannot simply say that Ξ z t/ε is the fast component of the process since z itself evolves (although slowly) in time. The fast-slow system X ε t = (ξ ε t , z ε t ) can be defined constructively (as in Section 2) or by describing its generator. Namely, for 1 ≤ i ≤ n, consider the operators where u is a function defined on R d . These operators would govern the evolution of the slow component for the fixed value i of the fast component in the absence of the fast motion. The second order term, the Laplacian in our case, could also be a more general operator to allow for more general diffusion in the slow variable. To account for the fast component, we define the operator where f is a function on {1, ..., n}×R d . This operator, with the properly specified domain, is the generator of the process X ε t = (ξ ε t , z ε t ). If q ij (z) > 0 whenever i = j, then the process Ξ z t has a unique invariant distribution µ(z) = (µ 1 (z), ..., µ n (z)), and (2) holds withv(z) = n i=1 µ i (z)v(i, z). Assume now that there is a closed domain G with a smooth boundary such that the chain Ξ z t is ergodic for z / ∈ G and has, say, two ergodic components R 1 = {1, ..., m} and R 2 = {m + 1, ..., n} for z ∈ G. Thus the transitions between R 1 and R 2 are impossible while z ε t ∈ G. Then one can expect that, as long as z ε t remains in G, it converges, as ε ↓ 0, to the solution of (3) either withv(z) = i∈R 1 , depending on whether the fast component evolves in R 1 or R 2 . Note that, while the invariant distribution is not determined uniquely for z ∈ G, the above expressions forv are.
The process z ε t can go from G to R d \ G and vice versa in finite time. Therefore, in order to define the limiting process, one should describe the behavior of the process in an infinitesimal neighborhood of ∂G. The novelty of the current work is that, in the presence of multiple invariant measures for the fast process, the limiting motion for the slow component is (and needs to be) considered on a graph or an open book (if d > 1), i.e., a more sophisticated space than the case of non-degenerate fast component, where the limiting process lives on the Euclidean space. For simplicity, we'll consider the onedimensional case, where the structure of the simplex of invariant probability measures is already non-trivial.
Solutions of the Cauchy problem and of various initial-boundary problems for PDE systems related to the operator A ε can be written as expectations of certain functionals of the process X ε t = (ξ ε t , z ε t ). This allows one to calculate the asymptotics of solutions to those PDE problems using the results for the process X ε t and vice versa. One can also apply the probability results to certain non-linear PDE problems related to the process. For example, certain problems for reaction-diffusion systems can be considered in this way (compare with [3], Chapters 5-7).
Finally, we note that the problem considered in this paper can be viewed as a problem concerning the long-time influence of small perturbations: the processX ε t = X ε εt starting at (i, z) can be viewed as a small perturbation of the processX t whose first component is Ξ z t starting at i and the second component z ∈ R d does not evolve in time.
A general approach to the study of the long-time influence of perturbations (see [4], [5]) is to consider the projection of X ε t =X t/ε onto the simplex of invariant probability measures of the unperturbed process. In the case when the unperturbed process isX t , the set M erg of the extreme points of the simplex (ergodic invariant measures) consists of the measures of the form µ(z) × δ z (where z / ∈ G and µ(z) is the invariant measure for Ξ z t ) and of the measures of the form µ 1 (z) × δ z and µ 2 (z) × δ z (where z ∈ G and µ 1 (z), µ 2 (z) are invariant for Ξ z t on R 1 and R 2 , respectively). The projection of a point (i, z), . Note that M erg can be parametrized by the set of pairs (l, z), z ∈ R d , l ∈ {1, 2} if z ∈ G and l = 0 if z / ∈ G. The main result of the paper is that the projection of X ε t onto M erg converges to a Markov process on M erg .

The fast-slow system
In this section, we'll introduce the fast-slow system X ε t = (ξ ε t , z ε t ). (Sometimes we'll write X x,ε t to indicate the dependence on the initial position x). The fast component ξ ε t evolves as a Markov chain, whose transition rates depend on the slow variable. The slow component z ε t solves an ODE or an SDE with the right hand side that depends on the fast variable. Namely, let q ij (z) ≥ 0, 1 ≤ i, j ≤ n, i = j be a family of transition rates for a Markov chain Ξ z t that depends on the parameter z ∈ R. Each of the functions q ij (z) is assumed to be continuous.
Moreover, we assume that q ij (z) − q ij (0), i = j, degenerate at the same rate as z ↑ 0, namely, there are positive constants q ij , a function ϕ : (−∞, 0) → (0, ∞) with lim z↑0 ϕ(z) = 0 and functions β ij : (−∞, 0) → R with lim z↑0 β ij (z) = 0 such that Let µ i (z), 1 ≤ i ≤ n, z ∈ R, be the invariant distribution of the Markov chain Ξ z t . This is not determined uniquely for z ≥ 0 since there are two ergodic classes for the Markov chain. However, under the above assumptions, there are limits π i = lim z↑0 µ i (z), and, for z ≥ 0, we select the unique invariant distribution such that µ i (z) are continuous functions on R. Define We'll assume that v(i, z) > 0 for each (i, z) (this assumption is not required if there is diffusion in the slow variable (case κ = 1 below)). Let us make a simplifying assumption about the behavior of the coefficients at infinity. Namely, we will assume that there is These assumptions can be relaxed significantly, however, this will not concern us since we would like to focus on the behavior of the process near z = 0. The slow component z ε t is assumed to be continuous and to satisfy at the points of continuity of ξ ε t . Here κ = 0 or κ = 1 (we'll consider two cases resulting in two different types of the limiting behavior). The fast component, intuitively, evolves as the Markov chain Ξ z t (with z = z ε t ), sped up by the factor 1/ε. However, since z itself evolves in time, we need a more formal definition of the process X x,ε t = (ξ x,ε t , z x,ε t ). Namely, the process starts at x = (i, z) ∈ M, and moves along the z-axis during a random At a random time σ, the process X x,ε t jumps to a random location (j, z σ ). The distribution of σ is determined as follows.
Having identified the location of the process at time σ, we treat it as a new starting point, and select a new (random) time interval for the jump-free motion of the process independently of the past. The construction then continues inductively. It is clear that the process just described is the RCLL Markov process.
The process X x,ε t = (ξ x,ε t , z x,ε t ) could be defined, equivalently, through its generator, using the Hille-Yosida theorem. We discuss the Hille-Yosida theorem and the generator of X x,ε t next, since, in any case, a similar construction will be used to define the limiting process when κ = 1.
Let M be a separable locally compact metric space, C 0 (M) be the space of continuous functions on M that tend to zero at infinity (can be made arbitrarily close to zero outside a sufficiently large compact). The space C 0 (M) is endowed with the supremum norm. Let P (t, x, B) be a Markov transition function (a priori not assumed to be conservative) on M. For f ∈ C 0 (M), let , page 365). Suppose that a linear operator A on C 0 (M) has the following properties: Then the operator A is the infinitesimal generator of a semi-group T t , t ≥ 0, on C 0 (M) that is defined by a stochastically continuous Markov transition function satisfying condition C 0 . The transition function with such properties is determined uniquely.
In the case when κ = 1, the domain of A ε consists of all functions f ∈ C 0 (M) such In both cases, it is possible to show that the conditions of the Hille Yosida theorem are satisfied. (We skip details since, in any case, the process was already defined constructively.) Let P ε (t, x, dx ′ ) be the corresponding Markov transition function, and T ε t , t ≥ 0, be the corresponding semi-group on C 0 (M). Take a sequence of functions f n ∈ D(A ε ) with values in [0, 1] with compact support such that f n (i, z) = 1 for |z| ≤ n, A ε f n C 0 ≤ 1/n. The existence of such a sequence is easily justified once we recall that the coefficients of A ε are constant for sufficiently large |z|.
Since A ε is the infinitesimal generator of the semi-group T ε t , we have (see Theorem I.1 of [8]), for f ∈ D(A ε ), Therefore, Rewrite (4) as ds is a RCLL martingale, and, for each stopping time τ with Eτ < ∞, we get Recall that we earlier defined the process X x,ε t constructively, without referring to the Hille-Yosida theorem. It is easily verified directly that the generator of this process coincides with A ε on D(A ε ). The Markov transition function of the process is stochastically continuous and satisfies condition C 0 . At the same time, by (4), the semigroup is defined uniquely by the values of the generator on a dense set, and thus the generator of the constructively defined process is A ε (rather than a non-trivial extension).

The limiting process
Let us describe the appropriate space and the limiting process on it for the fast-slow system X x,ε . These are three half-lines, with I 1 and I 2 distinguished by a label. We'll identify the ends of I 0 , I 1 , and I 2 , thus obtaining a graph, denoted by S, with three semi-infinite edges with the common vertex, which will be denoted O. Each point y = (l, z) ∈ S is determined by the label of the edge l ∈ {0, 1, 2} and the coordinate z, where z ∈ (−∞, 0] for l = 0 and z ∈ [0, ∞) for l = 1, 2.
First, consider the case when there is no diffusion in the slow variable (κ = 0). The process Y y t starting at y = (l, z) ∈ S will move deterministically with the variable speed v 0 on I 0 , v 1 on I 1 , and v 2 on I 2 . For y ∈ I 0 , we still need to describe the behavior of Y y t once the process reaches O. The behavior at O is random, the process proceeds to I 1 and I 2 with probabilities Next, consider the case with diffusion (κ = 1). The process Y y t is a diffusion inside each of the edges. However, a gluing condition is needed to describe the behavior of the process once it reaches the vertex. Thus, it is most convenient to define the process via its generator. The domain of A, denoted by D(A), consists of all functions f ∈ C 0 (S) such that: (a) 1 2 f ′′ (l, ·) + v l (·)f ′ (l, ·) ∈ C 0 (S), i.e., the differential operator can be applied to f inside each of the edges, and the resulting function can be extended to the vertex O, so that it becomes an element of C 0 (S).
(b) There are one-sided derivatives f ′ (l, 0) and It is not difficult to verify that the conditions of the Hille-Yosida theorem are satisfied and that the resulting Markov transition function, denoted by P (t, x, B), is a probability measure, as a function of B. Let Y y t , y ∈ S, be the corresponding Markov family and T t be the corresponding semigroup. In order to show that a modification with continuous trajectories exists, it is enough to check that lim t↓0 P (t, x, B)/t = 0 for each closed set B that doesn't contain x (Theorem I.5 of [8], see also [1]). Let f ∈ D(A) be a non-negative function that is equal to one on B and whose support doesn't contain x. Then as required. Thus Y y t can be assumed to have continuous trajectories.

A lemma on convergence of processes
The next lemma can be used to show convergence of families of parameter-dependent processes. We formulate it in a general setting. Consider a metric space M and a Markov family X x,ε t , x ∈ M, of processes that depend on a parameter ε > 0. We also consider a continuous mapping h : M → S from M to a locally compact separable metric space S and define the processes Y x,ε t = h(X x,ε t ), x ∈ M, ε > 0. The motivation to introduce the latter family of processes comes from our desire to study the limiting behavior of X x,ε t , as ε ↓ 0. However, the space M is too large for our purposes, i.e., the natural state space for the limiting process consists of equivalence classes in M rather than of individual points. Thus, Y x,ε t will capture reduced dynamics, where meaningful limiting behavior can be observed.
Note that while convergence to Markov processes on S as ε ↓ 0 will be established, the processes Y x,ε t need not be Markov for fixed ε > 0. The main point of the lemma is that, in order to demonstrate the convergence of Y x,ε t to a limiting process, it is sufficient to check that for small ε the processes nearly satisfy the relation (7), which is similar to the martingale problem but with the ordinary exptectation rather than the conditional expectation.
Lemma 4.1. Let h : M → S be a continuous mapping from a metric space M to a locally compact separable metric space S. Let X x,ε t , x ∈ M, be a Markov family on M that depends on a parameter ε > 0. Suppose that the processes Y x,ε t = h(X x,ε t ), x ∈ M, ε > 0, have continuous trajectories. Let Y y t , y ∈ S, be a Markov family on S with continuous trajectories whose semigroup T t , t ≥ 0, preserves the space C 0 (S). (This, together with the continuity of trajectories, implies that T t is a Feller semi-group, i.e., (1) There is λ > 0 such that for each f ∈ Ψ the equation λF − AF = f has a solution F ∈ D.
(2) For each T > 0, each f ∈ D, and each compact K ⊆ S, uniformly in x ∈ h −1 (K). Suppose that the family of measures on C([0, ∞), S) induced by the processes Y x,ε t , ε > 0, is tight for each x ∈ M. Then, for each x ∈ M, the measures induced by the processes Y x,ε t converge weakly, as ε ↓ 0, to the measure induced by the process Y h(x) t . Proof. Fix x ∈ M. Since the family of measures on C([0, ∞), S) induced by the processes Y x,ε t , ε > 0, is tight, we can find a process Z x t with continuous trajectories and a sequence ε n ↓ 0 such that Y x,εn t converge to Z x t in distribution as n → ∞. The desired result will immediately follow if we demonstrate that the distribution of Z x t coincides with the distribution of Y h(x) t (and thus does not depend on the choice of the sequence ε n ). We will show that Z x t is a solution of the martingale problem for (A| D , h(x)), i.e., for each T 2 > T 1 ≥ 0 and f ∈ D, First, however, let us discuss the uniqueness for solutions of the martingale problem. We claim that: (a) D is dense in C 0 (S).
To demonstrate (a), take an arbitrary δ > 0 and F 0 ∈ D(A). Let g 0 = λF 0 − AF 0 , and take g ′ ∈ Ψ such that g ′ − g 0 ≤ λδ. Let F ′ ∈ D be such that λF ′ − AF ′ = g ′ . Then, since A is the generator of a strongly continuous semigroup on C 0 (S), from the Hille-Yosida theorem it follows that F ′ − F 0 ≤ g ′ − g 0 /λ ≤ δ. This implies (a) since D(A) is dense in C 0 (S). Note that (b) follows from the existence of a solution F ∈ D to λF − AF = f ∈ Ψ and the density of Ψ, while (c) is obvious. The validity of (a)-(c) is enough to conclude that the distribution on C([0, ∞), S) of a process with continuous paths satisfying (8) is uniquely determined (Theorem 4.1, Chapter 4 in [2]).
Note that (8) since D ⊆ D(A) and A the the generator of the family Y y t , y ∈ S. Therefore, have the same distribution if (8) holds. It remains to prove (8).
Note that Z x t is a solution of the martingale problem for (A| D , h(x)) if and only if By the Markov property of the family X x,εn , which tends to zero in distribution, as follows from (7) and from the tightness of the sequence of random variables X x,εn T 1 . Therefore, using the boundedness of f , Af , and g 1 , ..., g k , we conclude that 5 Convergence of the fast-slow process

The case with no diffusion
Consider first a simplified version of the problem: assume that the fast-slow system X x,ε t = (ξ x,ε t , z x,ε t ) is defined as in Section 2, but q ij (z) > 0 for i = j (and thus Ξ z t is ergodic) for each z ∈ R. In this case, the fast Markov chain has a unique invariant distribution, which will be denoted by µ i (z), 1 ≤ i ≤ n, for each z ∈ R. Define Y y t , y ∈ R, to be the deterministic motion on the real line with the velocity v(y) The following theorem is a standard averaging result.
Proof. We apply Lemma 4.1 with S = R, Ψ = D = D(A). Thus we need to justify (7) for f ∈ D(A). Definef (i, z) = f (z), 1 ≤ i ≤ n. Using (5) (which is still valid in this simplified case) applied tof with τ = T , we can write It easily follows from the explicit construction of X x,ε t (Section 2) that the expression in the right hand side tends to zero uniformly in x. Now let us consider the original situation with two ergodic classes for the Markov chain when z ≥ 0. Recall that S is now a graph with three semi-infinite edges, I 0 , I 1 , and I 2 , with the common vertex O. The process Y y t on S has been defined in Section 2 (the case κ = 0). The motion is deterministic on each of the edges, while the behavior at O is random -the process proceeds to I 1 or I 2 with the prescribed probabilities p 1 and p 2 , respectively.
Let h be the mapping of M = {1, ..., n} × R to S defined as follows: Theorem 5.2. Suppose that κ = 0 and that the assumptions made in Section 2 are satisfied (in particular, the Markov chain Ξ z t has two ergodic classes for each z ≥ 0). For each x ∈ M, the measures induced by the processes Y x,ε t = h(X x,ε t ) on S converge weakly, as ε ↓ 0, to the measure induced by the process It is sufficient to show that for each η > 0 there is δ 0 > 0 such that each δ ∈ (0, δ 0 ] there is ε 0 > 0 such that for ε ∈ (0, ε 0 ], we have whenever x = (i, −δ). From the explicit construction of X x,ε t (Section 2), it is clear that z x,ε t increases, while on [−δ, δ], with the speed that is bounded from below by inf i,z∈[−δ,δ] v(i, z) > 0. This implies (10). To prove (11), we define f ε (i, z), z ∈ [−δ, δ], as the solution of the system of ODEs with the terminal condition We extend f ε to be defined on M so that f ε ∈ D(A ε ). Observe that, by construction, Thus it remains to analyze the asymptotics of the solution to the ODE. Let N(z) be the matrix, whose diagonal elements are N ii (z) = −(v(i, z)) −1 Q i (z) and off-diagonal elements are N ij (z) = (v(i, z)) −1 q ij (z). Let Solving the linear ODE, we get When δ is small, N δ is a small perturbation of the matrix N(0). Namely, let All the entries of H δ tend to zero when δ ↓ 0. Observe that all the off-diagonal entries of N δ are positive for each δ, and the sum of elements in each row is equal to zero. Therefore, zero is the simple eigenvalue of N δ with the right eigenvector equal to e = (1, ..., 1) T , the real parts of the other eigenvalues are negative. Let Π δ e (e) be the projection of e onto e along the space spanned by the remaining eigenvectors (and generalized eigenvectors) of the matrix N δ . Then for each i, and it remains to show that (Π δ e (e)) i (which does not depend on i) is close to p 1 for small δ.
Observe that zero is the top eigenvalue of N(0) with two linearly independent right eigenvectors e and e and two linearly independent left eigenvectors: where v i = v(i, 0). Let λ δ 1 < 0 be the eigenvalue of N δ with the second-largest real part (the top eigenvalue is zero). It is determined uniquely for small δ. Let g δ be the corresponding right eigenvector (determined up to a constant factor).
Lemma 5.3. The vector g δ can be represented as where g δ belongs to the space spanned by the eigenvectors (and generalized eigenvectors) of N(0), other than e and e. The coefficient α δ is bounded away from zero, and g δ tends to zero when δ ↓ 0.
Proof. Letī(δ) be such that |g δ i(δ) | = max 1≤i≤n |g δ i |. Assume, for now, thatī(δ) ∈ R 1 for all sufficiently small δ. Then, since N δ is a small perturbation of N(0) and λ δ 1 → 0 as δ ↓ 0, N δ g δ = λ δ 1 g δ easily implies that g δ i /g δ i(δ) → 1 as δ ↓ 0 for all i ∈ R 1 . Letπ δ be the normalized left eigenvector for N δ with eigenvalue zero. Fromπ δ N δ = 0 and N δ g δ = λ δ 1 g δ it follows that g δ ,π δ = 0. Letĩ(δ) be such that g δ i(δ) = max i∈R 2 |g δ i |. Observe thatπ δ i → π 1 i for i ∈ R 1 , andπ δ i → π 2 i for i ∈ R 2 . Therefore, for some positive constants c 1 and c 2 . As above, g δ i /g δ i(δ) → 1 as δ ↓ 0 for all i ∈ R 2 . From the facts that g δ ,π δ = 0,π δ i → π 1 i for i ∈ R 1 , andπ δ i → π 2 i for i ∈ R 2 , it follows that g δ i , i ∈ R 1 , are of the opposite sign from g δ i , i ∈ R 2 . The vector g δ can be represented as a sum of three components, g δ = a δ +b δ +c δ , where a δ is a multiple of e, b δ is a multiple of e, and c δ is in the space spanned by the eigenvectors (and generalized eigenvectors) of N(0), other than e and e. Observe that c δ / g δ → 0 as δ ↓ 0 since e and e span the eigenspace corresponding to the top eigenvalue of N(0) and g δ belongs to a small perturbation of that space. Moreover, from (13) and the fact that g δ i , i ∈ R 1 , and g δ i , i ∈ R 2 , are of the opposite sign, it follows that a δ / b δ is bounded from above and below. Therefore, (12) is possible with α δ bounded away from zero and infinity.
Finally, it remains to note that our conditionī(δ) ∈ R 1 does not lead to any loss of generality.
From the definition of the stopping times and the process X x,ε following property: if |z(t)| ≤ δ for t ≤ t 0 and if there is δ 0 > 0 such that λ(t :z(t) ≤ −δ 0 ) → ∞ as t 0 → ∞, then where λ is the Lebesgue measure on the real line andΞz t is a time-inhomogeneous Markov process with transition rates at time t given by q ij (z(t)).
To complete the proof of Lemma 5.6 in the case when there is no drift term, we condition the evolution of the fast component on the realization of the Brownian motion and obtain that the above argument is applicable for almost every realization of the Brownian motion (after rescaling the time by 1/ε).
As we discussed above, Lemma 5.6 completes the proof of the theorem.