Large deviations for the empirical measure and empirical flow of Markov renewal processes with a countable state space

Here we propose the Donsker-Varadhan-type compactness conditions and prove the joint large deviation principle for the empirical measure and empirical flow of Markov renewal processes (semi-Markov processes) with a countable state space, generalizing the relevant results for continuous-time Markov chains with a countable state space obtained in [Ann. Inst. H. Poincar\'{e} Probab. Statist. 51, 867-900 (2015)] and [Stoch. Proc. Appl. 125, 2786-2819 (2015)], as well as the relevant results for Markov renewal processes with a finite state space obtained in [Adv. Appl. Probab. 48, 648-671 (2016)]. In particular, our results hold when the flow space is endowed with either the bounded weak* topology or the strong $L^1$ topology. Even for continuous-time Markov chains, our compactness conditions are weaker than the ones proposed in previous papers. Furthermore, under some stronger conditions, we obtain the explicit expression of the marginal rate function of the empirical flow.


Introduction
Semi-Markov processes, which can be viewed as a direct extension of discrete-time and continuous-time Markov chains, are one of the most important classes of non-Markov processes. They have attracted considerable attention in recent years and have found wide applications in physics, chemistry, biology, finance, and engineering [1][2][3]. The embedded chain of a semi-Markov process is a discrete-time Markov chain, while the waiting times may not be exponentially distributed. Such non-exponential waiting time distributions have been found in many scientific problems such as molecular motors [4,5], enzyme kinetics [6,7], gene networks [8,9], and cell cycle dynmics [10,11]. The representation of a semi-Markov process in terms of its embedding chain and waiting times is also called a Markov renewal process.
The mathematical theory of large deviations was initiated by Cramér [12], and was later developed by many mathematicians and physicists. In the pioneering work [13][14][15][16], Donsker and Varadhan have established the large deviation principle (LDP) for the empirical measure and for the empirical process associated with a large class of discrete-time and continuous-time Markov processes. The large deviations for the sample mean, empirical measure, and empirical processes are usually said to be at level 1, level 2, and level 3, respectively [17]. For discrete-time and continuous-time Markov chains, the number of jumps along each oriented edge of the transition graph per unit time is called the empirical flow. The large deviations for the empirical flow are often said to be at level 2.5 [18], since it is between level 2 and level 3. For a discrete-time Markov chain (X n ) n≥0 , the large deviations for the empirical flow can be obtained directly from those for the empirical measure since the binary process (X n , X n+1 ) n≥0 is also a discrete-time Markov chain. However, things become much more complicated in the continuous-time case. For continuous-time Markov chains, Fortelle [19] proved a weak joint LDP for the empirical measure and empirical flow and obtained the corresponding rate function. Subsequently, Bertini et al. [20,21] proposed some Donsker-Varadhantype compactness conditions and proved the full joint LDP for the empirical measure and empirical flow.
Since semi-Markov processes are direct generalizations of discrete-time and continuous-time Markov chains, a natural question is whether the large deviations at different levels can be extended to semi-Markov processes. Along this line, Mariani and Zambotti [22] proved the joint LDP for the empirical measure and empirical flow of semi-Markov processes with a finite state space and expressed the corresponding rate function in terms of relative entropy.
The large deviations for the empirical flow of Markov and semi-Markov processes have also been applied in statistical mechanics to study the fluctuation relations of thermodynamical systems far from equilibrium [21,[23][24][25][26][27][28][29][30][31]. Up till now, it is still unclear whether these finite state space results can be generalized to semi-Markov processes with a countable state space. The aim of the present paper is to fill in this gap.
Here we investigate the joint large deviations for the empirical measure and empirical flow of semi-Markov processes with a countable state space. When the state space has an infinite number of states, the choice of the topology of the flow space, i.e. the value space of all empirical flows, will become very important. Following [20,21], we consider two types of topology of the flow space: the bounded weak* topology and the strong topology. Specifically, we propose two different Donsker-Varadhan-type compactness conditions for the two types of topology, and prove the corresponding LDP for the empirical measure and empirical flow. This is the first main contribution of this paper. In the special case of continuous-time Markov chains, our compactness condition reduces to the one proposed in [20] when the flow space is endowed with the bounded weak* topology. However, when the flow space is endowed with the strong topology, our compactness condition is even weaker than the one proposed in [21].
In [22], Mariani and Zambotti also provided the explicit expression of the marginal rate function for the empirical measure, but they have not given the explicit expression of the marginal rate function for the empirical flow. Here we propose two strong compactness conditions, one for the embedded chain and one for the waiting time distributions, and find that the marginal rate function for the empirical flow can be written down explicitly under these compactness conditions. The relationship between these compactness conditions and the geometric ergodicity of the embedded chain is also clarified. This is the second main contribution of this paper.
The present paper is organized as follows. In Section 2, we recall the definitions of Markov renewal processes and semi-Markov processes, as well as the definitions of the empirical measure and empirical flow. In Section 3, we propose two different compactness conditions when the flow space is endowed with the bounded weak* topology and the strong topology. Moreover, we also state the main results including the joint and marginal LDPs for the empirical measure and empirical flow. The proof of the LDP for the two types of topology will be given in Sections 4 and 5, respectively. In Section 6, we derive the marginal rate function for the empirical flow.

Markov renewal processes and semi-Markov processes
Let V be a countable set endowed with the discrete topology and the associated Borel σ-algebra is the collection of all subsets of V . The set [0, ∞] is equipped with the topology that is compatible with the natural topology on [0, ∞) so that (s, ∞] is open for any s ≥ 0. For any Polish space X , let P(X ) denote the collection of Borel probability measures on X . For any µ ∈ P(X ) and f ∈ L 1 (µ), let µ, f or µ(f ) denote the integral of f with respect to µ. The set P(X ) is equipped with the topology of weak convergence and the associated Borel σ-algebra.
We first recall the definition of Markov renewal processes, also called (J-X)-processes [3]. Here, we adopt the definition in [22] and [2,Chapter VII.4].
Definition 2.1. The process (X, τ ) = {(X k ) k≥0 , (τ k ) k≥1 } defined on a probability space (Ω, F , P) is called a Markov renewal process if (a) X = (X k ) k≥0 is a discrete-time time-homogeneous Markov chain with countable state space V and transition probability matrix P = (p xy ) x,y∈V .
(b) τ = (τ k ) k≥1 is a sequence of positive and finite random variables such that conditioned on (X k ) k≥0 , the random variables (τ k ) k≥1 are independent and have distribution where ψ xy ∈ P(0, ∞) for any x, y ∈ V . The matrix Ψ = (ψ xy ) x,y∈V is called the waiting time matrix. Note that it is a matrix of probability measures. The pair (P, Ψ) is called the transition kernel.
We next recall the definition of semi-Markov processes. where inf ∅ := ∞ and 1 A is the indicator function of the set A. The process ξ = (ξ t ) t≥0 with ξ t := X Nt is called the semi-Markov process associated with the Markov renewal process (X, τ ). Clearly, ξ is a jump process whose trajectories are right-continuous on the state space V .
Clearly, S n represents the nth jump time of the semi-Markov process ξ, N t represents the number of jumps of ξ up to time t, and X = (X k ) k≥0 is the embedded chain of ξ. It is easy to see that N t = n if and only if S n ≤ t < S n+1 . In particular, if all the waiting times are equal to 1, i.e. ψ xy = δ 1 for any x, y ∈ V with δ 1 being the point mass at 1, then ξ reduces to a discrete-time Markov chain. If all the waiting times are exponentially distributed, i.e. ψ xy (dt) = q x e −qxt dt for any x, y ∈ V , then ξ reduces to a continuous-time Markov chain. In the following, we do not distinguish the Markov renewal process (X, τ ) and the associated semi-Markov process ξ since they are totally equivalent.
The transition diagram of the embedded chain X is a directed graph (V, E), where the edge set is composed of all directed edges with positive transition probabilities. Throughout this paper, we impose the following basic assumptions on the Markov renewal process (X, τ ).
Assumption 1. The embedded chain X is irreducible.
Assumption 2. The embedded chain X is recurrent.
Assumption 3. The waiting time distribution only depends on the current state, i.e. ψ xy = ψ x for any x, y ∈ V .
Assumption 4. For any x ∈ V , the number of incoming edges into x of the graph (V, E) and the number of outgoing edges from x are both finite.
Assumptions 1 and 2 are standard in the literature [2]. Assumption 3 is also common in previous papers [22] which guarantees that the large deviation rate function of the empirical measure and empirical flow has the form of relative entropy. Note that if Assumption 3 does not hold for (X, τ ), then the process (Y, τ ) = {(Y k ) k≥0 , (τ k ) k≥1 } with Y k := (X k , X k+1 ) is also a Markov renewal process which satisfies Assumption 3 since Moreover, we emphasize that Assumption 4 is not needed if ξ is a continuous-time Markov chain [20,21]. Here we need this assumption in order to prove Lemma 4.10 below. In [20,21], the counterpart of this lemma is proved by using the classical level 3 large deviation results of Donsker and Varadhan [16] and the contraction principle. However, since there is no level 3 large deviation results for Markov renewal processes, we need to impose the above assumption to overcome some technical difficulties.
Recall that the semi-Markov process ξ is called non-explosive if the explosion time S ∞ := lim n→∞ S n satisfies for all x ∈ V , where P x (·) = P(·|X 0 = x). In fact, Assumptions 1 and 2 ensure that ξ is non-explosive. The proof is similar to the one given in [32] for continuous-time Markov chains and thus is omitted.

Empirical measure and empirical flow
Next we introduce the definitions of the empirical measure and empirical flow for Markov renewal processes [22].
For any t > 0, the empirical measure µ t : Ω → P(V × (0, ∞]) of (X, τ ) is defined by where δ. denotes the Dirac delta measure. In other words, for any x ∈ V and A ⊂ (0, ∞], we have Then µ t is a random probability measure such that for any Borel measurable function f on V × (0, ∞], Moreover, for any t > 0, the empirical flow Q t : Ω → [0, ∞] E of (X, τ ) is defined by In other words, for any x, y ∈ V , we have Intuitively, Q t (x, y) represents the number of times that ξ transitions from x to y per unit time. For any n ≥ 0, let be a filtration. Then N t + 1 is an {F n }-stopping time. It is easy to see that µ t , f and Q t (x, y) are F Nt+1 -measurable random variables.

Remark 2.3.
For the semi-Markov process ξ, a more natural definition of the empirical measure π t : Ω → P(V ) is given by Comparing (2.2) and (2.4), we can see that π t only focuses on the spatial variable and µ t focus on both the spatial and temporal variables. The reason why we use µ t rather than π t in the study of the joint LDP is that only by using µ t can we obtain a concise expression of the rate function. It is easy to verify that (X k , τ k+1 ) k≥0 is a Markov process and hence (X Nt , τ Nt+1 ) t≥0 is also a semi-Markov process. In fact, the empirical measure µ t for the process ξ t = X Nt is exactly the empirical measure π t for the process (X Nt , τ Nt+1 ) t≥0 .
Let L 1 (E) denote the set of absolutely summable functions on E and let · denote the associated L 1 -norm. The set of nonnegative elements of L 1 (E) is denoted by L 1 + (E), which is called the flow space. An element in L 1 + (E) is called a flow. Since (X, τ ) is non-explosive, it is easy to see that Q t ∈ L 1 + (E) for any t > 0. For any flow Q ∈ L 1 + (E), let the exit-current Q + : V → R and entrance-current Q − : V → R be defined by Intuitively, the exit-current and entrance-current at x are the flows exiting from x and entering into x, respectively. In for any x ∈ V , then Q is called a divergence-free flow and we define Note that both currents map L 1 + (E) into L 1 + (V ). Here L 1 + (V ) is defined in the same way as L 1 + (E) and let · denote the associated L 1 -norm.
In this paper, we will consider two types of topology on L 1 (E): the strong topology generated by the L 1 -norm and the bounded weak* topology, which is defined as follows [20]. Let C 0 (E) denote the collection of continuous functions f : E → R vanishing at infinity, i.e. for any ǫ > 0, there exists a finite set K ⊂ E such that and it is endowed with the L ∞ -norm. It is well-known that the dual space of C 0 (E) is L 1 (E) endowed with the strong topology. For any ℓ > 0, let B ℓ := Q ∈ L 1 (E) : Q ≤ ℓ denote the closed ball of radius ℓ in L 1 (E). In view of the separability of C 0 (E) and the Banach-Alaoglu theorem, the closed ball B ℓ endowed with the weak* topology is a compact Polish space. The bounded weak* topology on L 1 (E) is then defined by declaring the set A ⊂ L 1 (E) to be open if and only if A ∩ B ℓ is open in the weak* topology of B ℓ for any ℓ > 0. In fact, the bounded weak* topology is stronger than the weak* topology and is weaker than the strong topology. Moreover, for each ℓ > 0, the closed ball B ℓ is compact with respect to the bounded weak* topology. In particular, the three types of topology on L 1 (E) coincide only when E is finite. The proof of the above statements can be found in [33,Section 2.7].
For both the strong topology and the bounded weak* topology, we regard L 1 + (E) as a closed subset of L 1 (E) and endow it with the relative topology and the associated Borel σ-algebra. The product space V × (0, ∞] is equipped with the product topology so that V × (0, ∞] is a Polish space. The set P(V × (0, ∞]) of Borel probability measures on V × (0, ∞] is endowed with the topology of weak convergence. Moreover, the product space Λ = P(V × (0, ∞]) × is endowed with the product topology. Then for any t > 0, the pair (µ t , Q t ), where µ t is the empirical measure and Q t is the empirical flow, can be viewed as a measurable map from Ω to Λ.

Compactness conditions
The aim of the present paper is to establish the LDP for the empirical measure and empirical flow of semi-Markov processes. Similarly to the LDP of Markov processes [16], we need some compactness conditions to control the convergence rate at infinity when the state space V has an infinite number of states. For Markov processes, the infinitesimal generator is often used to establish the compactness conditions [16]. In the semi-Markov case, the transition kernel (P, Ψ) plays the role of the generator.
(c) For any t > 0 and x ∈ V , we have Proof. For any x ∈ V , let Θ x (λ) = ψ x (e λτ ) be a function on R. By the dominated convergence theorem, it is easy to see that Θ x ∈ C((−∞, ζ(x))) is a strict increasing function. Moreover, it is clear that Θ x (0) = 1 and Since θ x is the inverse function of Θ x for 0 < t < Θ x (ζ(x)), we immediately obtain (a)-(c).
In [15,16], Donsker and Varadhan proposed a compactness condition for the generator which was then used to prove the LDP for the empirical measure of Markov processes. In [20,21] In what follows, we will provide two Donsker-Varadhan-type conditions which are needed for the joint LDP for the empirical measure and empirical flow of semi-Markov processes when the flow space L 1 + (E) is endowed with the bounded weak* topology and strong topology. The following compactness condition is needed for the joint LDP when L 1 + (E) is endowed with the bounded weak* topology.

Condition 1.
There exists a sequence of functions u n : V → (0, ∞) such that (a) For any x ∈ V and n ≥ 0, we have P u n (x) < ∞; (b) There exists a constant c > 0 such that u n (x) ≥ c for any x ∈ V and n ≥ 0; (c) For any x ∈ V , there exists a constant C x such that u n (x) ≤ C x for any n ≥ 0; (d) The functions u n /P u n converge pointwise to someû : V → (0, ∞); (e) For each ℓ ∈ R, the level set x ∈ V : Lû(x) ≤ ℓ is finite; (f) There exist σ, C > 0, η ∈ (0, 1), and a finite set Remark 3.2. For continuous-time Markov chains, the term C1 K in item (f) can be replaced by a constant C and the conditionû(x) < ψ x (e ζ(x)τ ) can be removed since ψ x (e ζ(x)τ ) = ∞ for any x ∈ V . In this case, Condition 1 reduces to the compactness condition proposed in [20]. Moreover, it follows from Lemma 3.1 that for any x ∈ V . Hence item (e) implies that the level sets of ζ are also finite.
Since the strong topology is stronger than the bounded weak* topology, we need to impose a stronger compactness condition, which is essentially the Donsker-Varadhan-type condition for discrete-time Markov chains [15]. Both the following condition and Condition 1 are needed for the joint LDP when L 1 + (E) is endowed with the strong topology.

Condition 2.
There exists a sequence of functions u n : V → (0, ∞) such that (a) For any x ∈ V and n ≥ 0, we have P u n (x) < ∞; (b) There exists a constant c > 0 such that u n (x) ≥ c for any x ∈ V and n ≥ 0; (c) For any x ∈ V , there exists a constant C x such that u n (x) ≤ C x for any n ≥ 0; (d) The functions u n /P u n converge pointwise to someû : V → (0, ∞); (e) For each ℓ ∈ R, the level set x ∈ V : logû(x) ≤ ℓ is finite.
It is clear that Condition 2 only depends on the embedded chain X of the semi-Markov process and is independent of the waiting time distributions. When L 1 + (E) is endowed with the strong topology, Bertini et al. [21] have proposed another compactness condition for continuous-time Markov chains (see Condition 5 below, which is rewritten for semi-Markov process). However, that condition is more complicated than Condition 2 and more difficult to verify. In fact, when X is irreducible, Condition 2 is not only easier to verify, but also even weaker than Condition 5. The proof of this fact can be found in Appendix A. This explains why we impose Condition 2 rather than Condition 5 here. Note that item (e) in Condition 2 implies that the set K = {x ∈ V : logû(x) ≤ − log η} is finite for any η ∈ (0, 1).
Hence if we take σ = 1, then we can always find C > 0 such that logû ≥ −σ log η − C1 K . Furthermore, it is easy to check that ψ x (e ζ(x)τ ) = e ζ(x) = ∞ for any x ∈ V . This shows if Condition 2 holds, then items (e) and (f) in Condition 1 are automatically satisfied. Hence for discrete-time Markov chains, Condition 1 is equivalent to Condition 2.

Joint LDP for the empirical measure and empirical flow
Let µ and ν be two probability measures on a measurable space (X , F ). Recall that the relative entropy of µ with respect to ν has the following variational expression [16]: where B b (X ) denotes the space of bounded measurable functions on X . Moreover, if X is a Polish space and F is the associated Borel σ-field, then (3.3) still holds when B b (X ) is replaced by C b (X ), the space of bounded continuous functions on X [16]. If we set ϕ ′ = ϕ − log ν, e ϕ , then µ, ϕ ′ = µ, ϕ − log ν, e ϕ and it is easy to see that the relative entropy H(µ | ν) can be represented as where Q + and Q − are the exit-current and entrance-current of the flow Q, respectively. For each (µ, Q) ∈ D, we introduce the transition probabilities (Q xy ) x,y∈V and the waiting time distributions (μ x ) x∈V as where Q x = Q + (x) = Q − (x) is defined in (2.6) and we setQ xy = p xy andμ be a function defined by ∞, otherwise. (3.7) We are now in a position to state the main results of the present paper. The following theorem, whose proof can be found in Section 4, gives the joint LDP for the empirical measure and empirical flow when L 1 + (E) is endowed with the bounded weak* topology. (3.8) The following theorem, whose proof can be found in Section 5, gives the joint LDP when L 1 + (E) is endowed with the strong topology. Remark 3.6. In the above two theorems, we have established the joint LDP when the semi-Markov process starts from a fixed initial state x ∈ V . For any initial distribution γ ∈ P(V ), the joint LDP still holds when making some slight changes to the compactness conditions. In fact, if items (c) in Conditions 1 and 2 are both replaced by (c*) There exists a constant C γ such that x∈V γ(x)u n (x) ≤ C γ for any n ≥ 0, then the conclusions of Theorems 3.4 and 3.5 remain valid under P γ , where P γ (·) = x∈V γ(x)P x (·) is the probability measure under initial distribution γ.
The above two theorems can be applied to obtain the joint LDP for the empirical measure and empirical flow of  We have seen that Condition 1 is crucial for the joint LDP of the empirical measure and empirical flow, no matter whether the flow space is endowed with the bounded weak* topology or the strong topology. In general, Condition 1 is difficult to verify because we need to find a sequence of functions u n satisfying both items (a)-(d), which are related to the embedded chain, and items (e)-(f), we are related to the waiting time distributions. In other words, the conditions imposed on the embedded chain and the conditions imposed on waiting time distributions are intertwined with each other. Next, we provide some novel compactness conditions which can be verified much more easily. In the novel compactness conditions, the ones imposed on the embedded chain and the ones imposed on waiting time distributions can be disassembled and are not intertwined with each other.
The conditions imposed on the embedded chain are as follows.  (c) For each ℓ ∈ R, the level set {x ∈ V : q x ≤ ℓ} is finite. (c) Dirac distribution: ψ x (dt) = δ 1/qx (dt); In particular, items (a) and (b) in Condition 4 automatically hold for continuous-time Markov chains.
The following theorem shows that the joint LDP also holds under the new compactness conditions given above.  By item (b) in Condition 4, it is easy to see that the function θ x defined in (3.2) and the function θ defined above are related by θ x (s) = q x θ(s). Lemma 3.1 implies that θ is an increasing function. It follows from item (e) in Condition 3 and item (c) in Condition 4 that the set K := {x ∈ V : logû(x) ≤ ℓ} is finite and the set {x ∈ V : q x ≤ s} is also finite for any s ∈ R. For any ℓ ′ > 0, we have Then we have .
This implies item (e) in Condition 1.
(b) Note that Condition 2 implies Condition 3. Then the proof of (b) follows directly from Theorem 3.5.
Note that Condition 4 is easy to verify and we have given several examples for it to hold in Remark 3.8. Next we will give some criterions for Condition 3 to hold. Before doing this, we recall the following geometric ergodic theorem for discrete-time Markov chains, whose proof can be found in [34,Chapter 15].
Lemma 3.10 (geometric ergodic theorem). Suppose that the embedded chain X is irreducible and aperiodic. Then the following three conditions are equivalent: (a) There exist a finite set K ⊂ V and constants ν K > 0, ρ K < 1, and M K < ∞ such that (b) There exist a finite set K ⊂ V and a constant κ > 1 such that where T K is the first hitting time of X on K (see (4.12)).
(c) There exist a finite set K ⊂ V , constants b < ∞, λ < 1, and c > 0, and a function u : The following corollary gives a simple criterion for Condition 3 to hold.
Corollary 3.11. If item (c) in Lemma 3.10 holds, then Condition 3 also holds. If the embedded chain X is irreducible and aperiodic, then any one of the three conditions in Lemma 3.10 implies Condition 3.
Proof. Let the set K, the constants λ, c, and the function u be as in (c) in Lemma 3.10. We next check Condition 3 for the sequence of functions u n ≡ u. Obviously, items (a)-(d) are trivial. Letû = u/P u and ℓ = −(log λ)/2 > 0. It is easy to see that This completes the proof of this corollary.
The above corollary shows that geometric ergodicity of the embedded chain X implies Condition 3. Hence this condition can also be easily verified by using the classical ergodic theory of Markov chains [34][35][36][37].

Marginal LDP for the empirical measure and empirical flow
Thus far, we have established the joint LDP for the empirical measure µ t and empirical flow Q t . However, as discussed in Remark 2.3, a more natural definition of the empirical measure π t : Ω → P(V ) is given by The reason why we use µ t rather than π t in the study of the joint LDP is that only by using µ t can we obtain a concise expression of the rate function.
Next we focus on the marginal LDP for the empirical measure π t and for the empirical flow Q t . By the contraction principle, the rate function of the marginal LDP can be obtained from the rate function I : Λ → [0, ∞] of the joint LDP as defined in (3.8). In [22], the authors gave a variational expression of the rate function I 1 : for the empirical measure π t . Here we will give the variational expression of the rate function for the empirical flow Q t . More importantly, we will also give the explicit expression of I 2 when the waiting time distributions satisfy some additional constraints.
Before stating our results, we introduce some notation. Recall that the rate function I DV : P(V ) → [0, ∞] for the empirical measure of the embedded chain X is the Donsker-Varadhan functional [13] I DV (ν) = sup Let G x (λ) = log(ψ x (e λτ )) be a function on R and let (3.10) The following proposition, whose proof can be found in Section 6, gives the marginal LDP for the empirical measure and for the empirical flow.
(a) Under P x , the law of π t satisfies an LDP with good and convex rate function (b) Let L 1 + (E) be endowed with the bounded weak* topology. Then under P x , the law of Q t satisfies an LDP with good and convex rate function (c) Let L 1 + (E) be endowed with the strong topology. If Condition 2 is also satisfied, then under P x , the law of Q t satisfies an LDP with good and convex rate function Moreover, if Condition 4 is also satisfied, then the rate function I 2 has the following explicit expression: (3.11) Remark 3.13. This proposition shows that the rate function for the empirical flow has the explicit expression (3.11) when Condition 4 is satisfied. In fact, (3.11) may be still valid when Condition 4 is broken. For example, if V is finite and ψ x (e ζ(x)τ ) = ∞ for any x ∈ V , then similarly to the proof of Proposition 3.12, it can be shown that the rate function I 2 is also given by (3.11).

Examples
Our abstract theorems can be applied to many specific Markov renewal processes. We next focus on two specific examples: birth and death processes and random walks with confining potential and external force. These two examples can be viewed as direct generalizations of the ones studied in [20,21]. Here we will apply Theorems 3.4 and 3.5 to birth and death processes and will apply Theorem 3.9 to random walks with confining potential and external force.

Birth and death processes
Consider a birth and death Markov renewal process on the set of nonnegative integers N = {0, 1, 2 · · · } with transition kernel (P = (p xy ) x,y∈N , Ψ = (ψ x ) x∈N ), where In fact, Assumptions 1, 3, and 4 are trivial. In addition, it is well-known that Assumption 2 holds if and only if [38] ∞ k=1 Applying Theorems 3.4 and 3.5 to the above model, we obtain the following proposition.
(b) Let L 1 + (E) be endowed with the strong topology. Let p ′ x = sup k≥x p k for each x. Suppose that p = 0 and suppose that there exist constants σ > 0 and η ∈ (0, 1) such that (3.14) Then under P x , the law of (µ t , Q t ) satisfies an LDP with good and convex rate function Proof. (a) Since lim x→∞ p x < 1/2, it is easy to see that (3.12) holds. For any x ∈ N and n ≥ 0, let u n ( where a > 1 is a constant to be chosen later. By Theorem 3.4, we only need to check Condition 1 for the sequence of functions u n . Since u n do not depend on n, items (a)-(d) in Condition 1 are automatically satisfied. Moreover, it is This implies item (e) in Condition 1.
On the other hand,û(x) < ∞ = ψ x (e ζ(x)τ ) for any x ∈ K c . These imply item (g) in Condition 1. Thus far, we have check all items in Condition 1 and thus the desired result follows from Theorem 3.4.
Similarly, we only need to check item (e) in Condition 2 and items (e) and (f) in Condition 1 for such u n . Note that q ′ x−1 ≥ 1/2 for sufficiently large x. Then we havê The rest of the proof is similar to the proof of (a).
We emphasize that if ψ x is chosen to be an exponential distribution for each x, then Proposition 3.14 reduces to the results obtained in [20,21]. Moreover, if ψ x is chosen to be the gamma distribution for each x, where q x , α x are parameters depending on x, then we can give the following more specific characterizations of conditions (3.13) and (3.14).
and suppose that the parameters α x have a uniform positive lower bound, i.e. α x ≥ c for some constant c > 0 and any x ∈ N. Then condition (3.13) holds.
and suppose that the parameters α x have a uniform positive lower bound. Then condition (3.14) holds.
Proof. Here we only give the proof of (a); the proof of (b) is similar. Taking κ = κ 0 in Proposition 3.14, straightforward computations show that This shows that This completes the proof of this lemma.

Random walks with confining potential and external force
We now apply our main theorems to a nearest neighbor random walk on Z d with confining potential and external force, whose transition kernel (P = (p xy ) x,y∈Z d , ψ = (ψ x ) x∈Z d ) has the form of represents the external force, and are normalization constants. It is clear that U has compact level sets, i.e. for any ℓ ∈ R, the set {x ∈ Z d : U (x) ≤ ℓ} is finite. We emphasize that if ψ x (dt) = C x e −Cxt dt is chosen to be an exponential distribution for each x, then the above model reduces to the Markov chain model studied in [21,Section 10.2]. We next focus on general waiting time distributions. Applying Theorem 3.9 to the above model, we obtain the following proposition. Proof. Here we only give the proof of (a); the proof of (b) is similar. We first check Condition 3. Let u n (x) = e U(x)/2 for any x ∈ Z d and n ≥ 0. Since u n do not depend on n, items (a)-(d) in Condition 2 are automatically satisfied. Note Then lim |x|→∞ r(x) = ∞ if and only if lim |x|→∞ C x = ∞. Hence our compactness conditions in item (a) exactly coincide with the ones proposed in [21] when L 1 + (E) is endowed with the bounded weak* topology. However, since we do not need to verify Condition 5, our compactness conditions in item (b) is weaker than the ones proposed in [21] when L 1 + (E) is endowed with the strong topology.
whereŷ = y/|y|, y, W (y) = 0, and ·, · is the standard inner product in R d . We say that the potential U ∈ C 1 (R d ) has diverging radial variation which dominates the transversal variation [21] if there exist α ∈ [0, 1) and In fact, it follows from [21,Lemma 10.3] that if U ∈ C 1 (R d ) has diverging radial variation which dominates the transversal variation, then lim |x|→∞ r(x) = ∞.

Proof of Theorem 3.4
Note that L 1 (E) endowed with the bounded weak* topology is a locally convex, complete linear topological space and a completely regular space, i.e. for every closed set C ⊂ L 1 (E) and every element Q ∈ L 1 (E) \ C, there exists a continuous function f : It is a well-known result that if an exponentially tight family of probability measures satisfies a weak LDP with rate function I, then I is good and the (full) LDP holds [17,Lemma 1.2.18]. Hence to prove Theorem 3.4, we will first prove the exponential tightness for the empirical measure and empirical flow under Condition 1, and then prove the weak joint LDP without any compactness conditions. We will directly consider the case where ξ starts from a general initial distribution γ (see Remark 3.6).

Exponential local martingales
We start by considering the change of probability measures for Markov renewal processes. Let Γ be the set of For any (F, h) ∈ Γ, let g F,h : V → R be a function given by To proceed, we define a new transition kernel (P F , Ψ h ) as and let P F,h x be the probability measure under which (X, τ ) is a Markov renewal process with transition kernel (P F , Ψ h ) and initial state X 0 = x. Note that the semi-Markov process ξ may be explosive under P F,h x . As a result, we need to consider P F,h x and P x restricted to the set {N t < ∞}, i.e.
Moreover, we denote by P F,h x,t | FN t +1 and P x,t | FN t +1 the restrictions of P F,h x,t and P x,t to F Nt+1 , respectively. It is easy to verify that P F,h x,t | FN t +1 is absolutely continuous with respect to P x,t | FN t +1 and has the Radon-Nykodim derivative For convenience, for any Borel measurable function f on V × (0, ∞], let Then as an immediate consequence of the Radon-Nykodim derivative (4.4), we obtain the following result. : Ω → (0, ∞) be a function defined by In fact, it is easy to check that under P x , the process M F,h is a positive local martingale and a supermartingale with respect to (F Nt+1 ) t≥0 . The next statement can be deduced from Lemma 4.1 by choosing specific F and h.
Therefore, by item (c) in Lemma 3.1, we have (0,∞) e sh(x,s) ψ x (ds) = u(x)/P u(x). This implies that (F, h) ∈ Γ and g F,h ≡ 0. On the other hand, it is easy to check that Q t , F = u(X Nt+1 )/u(X 0 ). This completes the proof of this lemma.

Exponential tightness
We will next prove the exponential tightness of the empirical measure and empirical flow under Condition 1 with item (c) is replaced by item (c*) in Remark 3.6.
Let the functionû, the sequence of functions u n , the set K, and the constants c, C γ , σ, η be as in Condition 1 and where we have used the condition that u n /P u n converges pointwise toû.
Proof. By Fatou's lemma, we have For any x ∈ K c , it follows from items (d) and (f) in Condition 1 that there exists n x ∈ N such that u n (x)/P u n (x) < ψ x (e ζ(x)τ ) for any n ≥ n x . By Lemma 4.2 and item (b) in Condition 1, we have Combining (4.9), (4.10), and item (c*) in Remark 3.6, we have Moreover, by item (f) in Condition 1, we obtain Since N > 0 and Lη(x) < 0 for any x ∈ K c , we have This implies the second inequality in (4.8).
For any B ⊂ V , let (T k B ) k≥0 be a sequence of stopping times defined by Clearly, T k B is the kth hitting time of (X n ) n≥0 on the set B. The first hitting time T 1 B is always abbreviated as T B in what follows. If we only focus on the behavior of (X, τ ) in the set B, we obtain a new process (X,τ ) = (b) (τ k ) k≥2 is a sequence of positive and finite random variables such that conditioned on (X k ) k≥1 the random variables (τ k ) k≥2 are independent and the waiting time matrix (ψ xy ) x,y∈B is given bȳ where the definition of the waiting time matrix can be found in Definition 2.1.
Proof. (a) Since (X k ) k≥0 is irreducible and recurrent, we have T k B < ∞, P x -a.s. for any x ∈ V and k ≥ 0. For any n ≥ 0 and x 0 , x 1 , · · · , x n+1 ∈ B, where the last step follows from the strong Markov property. In fact, we can obtain the recurrence of (X k ) k≥1 directly from the recurrence of (X k ) k≥0 . For any x, y ∈ B, since (X k ) k≥0 is irreducible, there exist a positive integer n ≥ 1 and a sequence of states x 0 , x 1 , · · · , x n ∈ V with x 0 = x and x n = y such that p x0,x1 p x1,x2 · · · p xn−1,xn > 0. Select all states in {x k } 0≤k≤n in the set B and write them as x i1 , · · · , x is with 0 = i 1 < i 2 < · · · < i s = n. Then This implies that (X k ) k≥1 is irreducible.
s. for any x ∈ V and k ≥ 0, it is easy to see that (τ k+1 ) k≥0 is a sequence of positive and finite random variables. Note that conditioned on (X k ) k≥0 the random variables (τ k ) k≥1 are independent and have where we use the fact that (T k B ) k≥0 is σ((X k ) k≥0 )-measurable. For any n ≥ 2 and f 1 , · · · , f n ∈ B b (0, ∞), similarly to (4.14), we have Note thatX k = X T k B . By the strong Markov property of (X k ) k≥0 , it is clear that conditioned on Moreover, for any x, y ∈ B, we have This completes the proof of this proposition. Remark 4.5. In the above proposition, we have proved that conditioned on (X k ) k≥0 , the random variables (τ k ) k≥1 are independent. This implies that (X, τ ) = {(X k ) k≥0 , (τ k ) k≥1 } is a delayed Markov renewal process, whose transition probability matrix and waiting time matrix of the first step is different from the remaining steps [1,Chapter 4.12].
Lemma 4.6. Let K be a finite subset of V . Then for any ℓ ∈ N, there exists a real sequence A ℓ ↑ ∞ such that Proof. Let (X,τ ) be defined as in (4.13) with B = K. By Remark 4.5, we have seen that (X,τ ) is a delayed Markov renewal process. Similarly to Markov renewal processes, we can also define the nth jump timeS n , the numberN t of jumps up to time t, and the empirical flowQ t for (X,τ ). Note that is the number of times that ξ jumps into the set K in the time interval (0, t]. It is easy to see that By the exponential Chebyshev inequality, we have where ⌊A ℓ t⌋ denotes the integer part of A ℓ t. By Proposition 4.4 and Remark 4.5, it is clear that conditioned on (X k ) k≥0 , the random variables (τ k ) k≥1 are independent. Then we obtain whereψ xy (e −τ ) = (0,∞) e −sψ xy (ds). Since K is finite, it is clear that sup x,y∈Kψ xy (e −τ ) < 1. Combining (4.17) and (4.18), we have

This completes the proof by choosing
The following proposition states the exponential tightness of the empirical measure and empirical flow. and a real sequence A ℓ ↑ ∞ such that for any ℓ ∈ N lim lim In particular, the empirical measure and empirical flow are exponentially tight.
Proof. We first prove (4.20). We consider the exponential local martingale in Lemma 4.1 by choosing F ≡ 0 and where N is as in Lemma 4.3. By Lemma 3.1, it is easy to check that (F, h) ∈ Γ and g F,h = (N/σ)1 K + log η1 K c . It then follows form (4.5) that Combining (4.11) and the first inequality in Lemma 4.3, we have By exponential Chebyshev inequality, we obtain On the other hand, by Lemma 4.6, there exists a real sequence A ′ ℓ ↑ ∞ such that Hence it is enough to show that there exists a sequence A ℓ ↑ ∞ such that for any t > 0 and ℓ ∈ N, Since log η < 0, we have where we have used Lemma 4.1 in the last step. The proof of (4.22) is now completed by choosing Recall that the closed ball in L 1 + (E) is compact with respect to the bounded weak* topology. Then the exponential tightness of the empirical flow follows from (4.20).
We next prove (4.19). For a sequence of constants a m ↑ ∞ to be chosen later, set W m = {x ∈ V : Lû(x) ≤ a m } ∪ K. In view of items (e) and (f) in Condition 1, it is clear that W m is a finite subset of V . Now set where µ, 1/s = x∈V (0,∞] µ(x, ds)/s. Then for any µ ∈ K ℓ = K 1 ℓ ∩ K 2 ℓ , we have Since W m × [ǫ, ∞] a compact subset of V × (0, ∞], it follows from Prokhorov's theorem that K ℓ is a compact subset we only need to prove lim t→∞ 1 t log P γ µ t ∈ K 1 ℓ ≤ −ℓ and lim Then we can obtain the second inequality in (4.23) from (4.20), i.e.
We next prove the first inequality in (4.23). By item (f) in Condition 1, it is easy to see that Lû(x) ≥ −σLη(x) ≥ 0 for any x ∈ K c . Recall the definition of W m , hû, and N . It is easy to see that Note that By the exponential Chebyshev inequality and Lemma 4.3, we obtain

Upper bound
We For any (F, h) ∈ Γ 0 , let I F,h : Λ → R be a functional defined by Based on the proof of Lemma 4.10, it is easy to see that I F,h is a lower semicontinuous function on Λ. For any δ > 0,  Proof. We first prove that for any measurable B ⊂ Λ, Since c ≥ 0 and ϕ ∈ C c (V × (0, ∞)), we have Hence we have proved (4.30). It is easy to see that P γ ((µ t , Q t ) ∈ C δ ) = 1 for any t ≥ 1/δ. Finally, taking B = O ∩ C δ in (4.30), we obtain This completes the proof of this lemma.
By Assumption 4, the graph (V, E) is locally finite. It is easy to check that f 1 This implies that C δ is a closed subset of Λ. Moreover, it is easy to see that C δ is convex. Thus we only need to prove that I F,h is a convex and lower semicontinuous function on Λ.
Since I F,h is a linear functional, it must be convex. Since F ∈ C c (E) ⊂ C 0 (E), the map Q → Q, F is continuous. On the other hand, since ϕ ∈ C c (V × (0, ∞)), there exists a finite set K ⊂ V such that ϕ x ≡ 0 for x ∈ K c and ϕ x ∈ C c (0, ∞) for x ∈ K. Then h ϕ,c,M is a bounded lower semicontinuous function. For any µ ∈ P(V × (0, ∞]), let µ n ∈ P(V × (0, ∞]) be a sequence of probability measures such that µ n weakly converges    Without loss of generality, we assume that µ(x, 1/τ ) < Q x . For any C > 0, (y, z) ∈ E, and (y, s) ∈ V × (0, ∞], set It is easy to check that g FC ,hC ≡ 0 and Here (F C , h c ) / ∈ Γ 0 , but we still define I FC ,hC (µ, Q) as in (4.26). As C → ∞, we have I FC ,hC (µ, Q) → ∞. Now for any C > 0, we construct a sequence (F n , h n ) ∈ Γ 0 such that lim n→∞ I Fn,hn (µ, Q) = I FC ,hC (µ, Q). For any y ∈ V , it is easy to see that there exists a non-negative function g y ∈ C c (0, ∞) such that ψ y ({t ∈ (0, ∞) : where f n : (0, ∞] → R is a sequence of continuous functions satisfying f n (s) = 0 if s ∈ (0, 1/(n + 1)) ∪ (n + 1, ∞] and f n (s) = −1 if s ∈ (1/n, n), c n is a sequence of constants to be chosen later such that g Fn,hn ≡ 0, and K n is a sequence of finite sets such that K n ↑ V and Since lim n→∞ ψ x , e Cfn = −C, such K n must exist. It is easy to check that g Fn,hn (y) = 0 for any y = x and Since g Fn,hn (x) is continuous with respect to c n , by the intermediate value theorem, there exists a sequence of constants c n such that g Fn,hn (x) ≡ 0, which implies that (F n , h n ) ∈ Γ 0 . Taking n → ∞ on both sides of (4.36), it is easy to see that c n → 0. Thus in the sense of pointwise convergence, we have (F n , h n ) → (F C , h C ). By the dominated convergence theorem, we obtain (4.34). This further implies (4.33).
Case 2: (µ, Q) ∈ D. Then for any x ∈ V , we have Recall the definition ofQ xy andμ x in (3.6). Since V and (0, ∞) are both Polish spaces, it follows from (3.4) that for Recall the definition of I in (3.7). Then for any F ∈ C b (E), ϕ ∈ C b (V × (0, ∞)) satisfying y∈V p xy e F (x,y) = 1 and ψ x , e ϕx = 1 for any x ∈ V , and c : V → [0, ∞) satisfying 0 ≤ c ≤ ζ and c(x) < ζ(x) for any x ∈ V with ζ(x) > 0, we only need to find a sequence (F n , h n ) ∈ Γ 0 such that where ϕ n x : (0, ∞] → R is a sequence of continuous functions satisfying ϕ n x (s) = 0 if s ∈ (0, 1/(n + 1)) ∪ (n + 1, ∞) and ϕ n x (s) = ϕ x (s) if s ∈ (1/n, n), a n x and b n x are two sequences of constants satisfying y∈V p xy e Fn(x,y) = 1 as well as ψ x , e shn(x,s) = 1, g x is the function used in (4.35), and K n is a sequence of finite sets such that K n ↑ V .
For any x ∈ V and sufficiently large n that may depend on x, straightforward computations show that a n x = log y∈Kn p xy e F (x,y) Moreover, it is easy to see that such sequence of constants b n x exists, and a n x → 0 and b n x → 0 as n → ∞. For all x ∈ V , we have Then (4.37) follows from Fatou's lemma. This completes the proof of this lemma.
We are now in a position to prove the upper bound of the LDP.  Moreover, if Condition 1 is satisfied, then the above equation also holds for any closed set K ⊂ Λ.
Proof. We first prove that I is convex and lower semicontinuous. Since we have proved I F,h,δ is convex and lower semicontinuous in Lemma 4.10, we only need to verify that Case 1: we have Case 2: Q + = Q − . By Lemma 4.11, we have Recall the definition of C δ in (4.27) and the definition of I F,h,δ in (4.28). It is easy to check that for any δ > 0, we have (µ, Q) ∈ C δ and I F,h (µ, Q) = I F,h,δ (µ, Q). Optimizing over (F, h) ∈ Γ 0 and δ > 0, we obtain We next prove (4.38). Minimizing (4.29) over (F, h) ∈ Γ 0 and δ > 0, we obtain Note that I F,h,δ is lower semicontinuous for any (F, h) ∈ Γ 0 and δ > 0. Finally, under Condition 1, it follows from Proposition 4.7 that (µ t , Q t ) is exponentially tight. Hence the upper bound in (4.38) also holds for any closed set K ⊂ Λ.

Lower bound
We next prove the lower bound of the LDP. Before giving the proof, we recall the following lemma, which can be found in [20,Lemma 5.2].  In order to prove the lower bound, we will apply Lemma 4.13 for completely regular topological space Λ. Recall the definitions of D andQ in (3.5) and (3.6), respectively. Let D 2 ⊂ D 1 ⊂ D be defined by (Q xy ) x,y∈V defines an irreducible transition matrix on V , (4.40)  In the following lemma, we will construct a family of probability measures {P x t } on Λ and prove (4.39) with the upper bound given by (4.42). The proof is in the spirit of that given in [22,Proposition 5.1], but some details are supplemented.
Proof. By the definition of J in (4.42), we may restrict the proof to (µ, Q) ∈ D 2 . When (µ, Q) ∈ D 2 , we have and it is easy to check that It then follows from [16, Theorem 2.1 and the remark after Theorem 2.1] thatQ x,· ≪ p x,· ,μ x ≪ ψ x and (x,y)∈E In this way, the transition probability kernel (P F , Ψ h ) defined in (4.3) are given by Let P F,h γ be the probability measure under which (X, τ ) is a Markov renewal process with transition kernel (P F , Ψ h ) and initial distribution γ. Since (µ, Q) ∈ D 2 , it is easy to see that p F =Q is irreducible. It then follows from (3.5) where ν F x = Q x / Q for any x ∈ V . This means that ν F is the unique invariant distribution for P F . Since µ(x, {∞}) = 0 for any x ∈ V , we have By the strong law of large numbers for semi-Markov processes, for any We now construct the family of probability measures {P (µ,Q) t }. For any ǫ > 0 and t ≥ 0, let T t = ⌊ Q (1 + ǫ)t⌋ and let P γ,t,ǫ be the probability measure under which the law of the process (X, τ ) = {(X k ) k≥0 , (τ k ) k≥1 } satisfies the following requirements: } is a Markov renewal process with transition kernel (P F , Ψ h ) and initial distribution γ.
Intuitively, under P γ,t,ǫ , the process (X, τ ) has the transition kernel (P F , Ψ h ) before time T t and has the transition kernel (P, Ψ) after time T t . SetP t,ǫ = P γ,t,ǫ •(µ t , Q t ) −1 and let ǫ(t) ↓ 0 to be chosen later such thatP  For any ǫ > 0, it follows from (4.46) and the strong law of large numbers for Markov renewal processes [3, Theorem For any t > 0, let where F Tt = σ((X k , τ k ) 0≤k≤Tt ). It is easy to see that P F,h γ | FT t = P γ,t,ǫ | FT t . Then we have  Before determining ǫ(t), we give an estimation of the relative entropy. We observe that for any ǫ > 0,

(4.51)
Indeed, the first inequality follows from the variational characterization of the relative entropy [16, Section 2] and the last equality is a straightforward computation of the Radon-Nikodym density (similarly to (4.4)).
Combining (4.43), (4.44), and (4.45), we have Q, |F | < ∞ and µ, |h| < ∞. Note that (X k , X k+1 , τ k+1 ) k≥0 is a Markov process. By the ergodic theorem of Markov processes, we have (4.52) We next construct ǫ(t) ↓ 0. Let ǫ(t) = 1/n for any t n−1 < t ≤ t n be a step function, where t n is an increasing sequence such that It follows from (4.50), (4.51), and (4.52) that such t n ↑ ∞ exist. Then we have Finally, we prove (4.49). Note that It thus follows from the first equality of (4.53) that On the other hand, since D t,ǫ(t) = {N t + 1 ≤ T t }, we have G(µ t , Q t )1 D t,ǫ(t) ∈ F Tt . Note that P(V × (0, ∞]) is endowed with the weak convergence topology and L 1 + (E) is endowed with the bounded weak* topology. It is a direct consequence of (4.47) and (4.48) that Moreover, it follows from the first equality of (4.53) that Then we obtain where the convergence in (4.54) follows from the dominated convergence theorem.
To prove the lower bound of the LDP, we only need to prove that the rate function I coincides with sc − J. Before giving the desired results, we need the following two lemmas.
Lemma 4.15. Suppose that Assumption 1 is satisfied. Then for any µ = (µ x ) x∈V ∈ P(V ) satisfying µ x > 0 for any x ∈ V , there exist a constant C > 0 and an irreducible, positive recurrent Markov chain (X k ) k≥0 with state space V , transition probability matrixP = (p xy ) x,y∈V , and invariant distributionν = (ν x ) x∈V such thatÊ ⊂ E and Proof. We first construct a sequence of finite subsets V n ↑ V by induction. Without loss of generality, we assume that Since Assumption 1 holds, it is clear that (V, E) is connected. Thus, there exists a self-avoiding cycle (x 1 (:= z 1 ), x 2 · · · , x k0 ) of elements of V such that (x i , x i+1 ) ∈ E when i = 1, · · · , k 0 and the sum in the indices is Suppose that we have constructed V n and z 1 , · · · , z q ∈ V n , z q+1 / ∈ V n . Since (V, E) is connected, it is easy to see that there exist x r ∈ V n for some 1 ≤ r ≤ k n = |V n | and a sequence of distinct states w 1 , · · · , w m1−1 ∈ V \ V n such that (w i , w i+1 ) ∈ E when i = 0, · · · , m 1 − 1, where w 0 := x r and w m1 := z q+1 . Similarly, there exist x l ∈ V n and a sequence of distinct states w m1+1 , · · · , w m2 ∈ V \ V n such that (w i , w i+1 ) ∈ E when i = m 1 , · · · , m 2 , where If {w 1 , · · · , w m1 } ∩ {w m1+1 , · · · , w m2 } = ∅, let m = m 2 and y i = w i for i = 1, · · · , m.
Then we have constructed a sequence of subsets {V n } n≥0 such that |V n | < ∞ and V n ⊂ V n+1 . Moreover, following from the above constructions, it is easy to see that for any z q ∈ V , there exists V n such that z q ∈ V n . This implies that We next construct a sequence of matrices P n = (p n xy ) x,y∈V base on the previous construction of {V n } n≥0 . Let Suppose that we have constructed P n = (p n xy ) x,y∈V . Let p n+1 xr,y = (1 − α n+1 )p n xr ,y , y ∈ V n , p n+1 xy = p n xy , otherwise, where x r , x l ∈ V n are defined the same as in the above construction of {V n } n≥0 and α n+1 is a positive constant remaining to decide. Then the sequence of matrices P n is constructed by induction. It is easy to see that P n can be considered as a transition probability matrix with state space V n , which corresponds to an irreducible Markov chain.
We let ν n ∈ P(V ) be the invariant distribution for P n . Now we decide {α n } n≥0 and C such that ν n x < Cµ x for any n ∈ N and x ∈ V by induction. Let C = 1 + (max x∈V0 (1/µ x ))/k 0 and α 0 = 0. It is clear that Suppose that we have decided (α k ) 0≤k≤n such that ν n x < Cµ x for any x ∈ V . Note that ν n+1 is continuous with respect to α n+1 . In other words, Since V is countable, the strong convergence of ν n+1 to ν n in P(V ) is equivalent to the pointwise convergence of ν n+1 x to ν n x for any x ∈ V . This implies lim αn+1↓0 ν n+1 − ν n = 0. Hence, there exists 0 < α n+1 ≤ 1/2 n+1 such that Finally, we constructP . For any x ∈ V , since V n ↑ V , there exists N such that x ∈ V N . It follows from (4.55) This shows thatP can be defined asp xy = lim n→∞ p n xy ≥ 0. By Fatou's lemma, we have which implies y∈Vp xy = 1 for any x ∈ V . In other words,P is a transition probability matrix. Note that if there exists n such that p n xy > 0, then (1 − α k ) p n xy ≥ 1 − 1 2 n p n xy > 0.
Since P n corresponds to an irreducible Markov chain with state space V n and V n ↑ V , it is easy to see thatP is irreducible.
On the other hand, it follows from (4.56) that we can setν = lim n→∞ ν n . Moreover, it is clear thatν ∈ P(V ) andν x ≤ Cµ x for any x ∈ V . It follows from (4.57) that for any y ∈ V and N ∈ N, Taking n → ∞ and then taking N → ∞ on both two sides of the above inequality, we havê ν y = lim n→∞ ν n y = lim n→∞ x∈V ν n x p n xy = x∈Vν xpxy , y ∈ V.
Obviously, we haveÊ ⊂ E. Then we finish the proof of this lemma. Proof. We first construct a transition kernel (P = (p xy ) x,y∈V ,Ψ = (ψ x ) x∈V ) satisfying the following four requirements: (a)P is an irreducible transition probability matrix with an unique invariant probability measureν.
Without loss of generality, we assume that V = N is the set of nonnegative integers. Let m x = sup{c ≥ 0 : For any x ∈ N, let C x = − y∈N log p xy . By Assumption 4, it is clear that Finally, let Z = x∈Vν x (0,∞) sψ x (ds) and It is easy to check that (µ 0 , Q 0 ) ∈ D 1 . This completes the proof of this lemma.
We are now in a position to finish the proof of the lower bound of the LDP.
Proposition 4.17. Suppose that Assumptions 1-4 are satisfied. Let L 1 + (E) be endowed with the bounded weak* topology. Then under P γ , the law of (µ t , Q t ) ∈ Λ satisfies an LDP lower bound with convex rate function I. Moreover, if Condition 1 is satisfied, then I is a good rate function.
Proof. Let J be the functional defined in (4.42). By Lemmas 4.13 and 4.14, we only need to prove I = sc − J. By Proposition 4.38, I is convex and lower semicontinuous on Λ. It is then easy to see that sc − J ≥ I. We next prove the converse inequality. In fact, we only need to prove that for any (µ, Q) ∈ Λ with I(µ, Q) < ∞, there exists a sequence (µ n , Q n ) n≥0 in D 2 such that (µ n , Q n ) → (µ, Q) in Λ and lim n→∞ I(µ n , Q n ) ≤ I(µ, Q). (4.61) Here, we only prove that there exists a sequence (µ n , Q n ) n≥0 in D 1 such that (µ n , Q n ) → (µ, Q) in Λ and (4.61) holds. The rest of the proof is similar to [22,Lemma 2.5].
By Lemma 4.16, there exists (µ 0 , Q 0 ) ∈ D 1 . For any (µ, Q) ∈ D with I(µ, Q) < ∞, let Obviously, (µ n , Q n ) → (µ, Q) in Λ in the sense of the strong topology. Since I is convex on Λ, we have Then it is easy to see that (µ n , Q n ) ∈ D 1 and (4.61) holds. This completes the proof of the lower bound.  The proof is similar to that given in [22,Lemma 2.5].
5 Proof of Theorem 3.5 Next we will prove the joint LDP for the empirical measure and empirical flow when L 1 + (E) is endowed with the strong topology. Note that the bounded weak* topology is weaker than the strong topology [33,Theorem 2.7.2]. In other words, any open (closed) subset of L 1 + (E) under the bounded weak* topology is also open (closed) under the strong topology. Since we have established the joint LDP when L 1 + (E) is endowed with the bounded weak* topology in Section 4, we only need to prove the exponential tightness of the empirical flow when L 1 + (E) is endowed with the strong topology [17,Corollary 4.2.6]. Before proving the exponential tightness, we introduce some notation.
Recall the definition of exit-current and entrance-current in (2.5). For any t > 0, we define the associate empirical It is clear that Q + t , Q − t ∈ L 1 + (V ), P γ -a.s. For any J ∈ L 1 + (V ) and f : V → R, we set J, f = x∈V J(x)f (x). The exponential tightness of the empirical currents is stated in the following proposition. Here we also consider the case where the semi-Markov process starts from the general initial distribution γ (see Remark 3.6).
Proposition 5.1. Assume Conditions 1 and 2 to hold. Then there exists a sequence {K ℓ } of compact sets in L 1 + (V ) such that for any ℓ ∈ N, where L 1 + (V ) is endowed with the strong topology. In particular, the empirical exit-current is exponentially tight.
Proof. We first construct the compact sets in L 1 + (V ) under the strong topology. Let W m ↑ V be an invading sequence of finite subsets of V , for any positive integer ℓ, we let where A ℓ is defined as in Proposition 4.7. Similarly to the proof in [21,Theorem 5.2], it is easy to prove that is compact under the strong topology. Now we prove (5.2). It is easy to see that It then follows from (4.20) and (5.1) that On the other hand, let the functionû, the sequence of functions u n , and the constants c, C γ be as in Condition 2 and item (c*) in Remark 3.6. Taking A = V in Lemma 4.2, we obtain the local martingale Note that u n /P u n converges pointwise toû. By Fatou's lemma, we have where the second inequality follows from item (b) in Condition 2 and the last inequality follows from item (c*) in Remark 3.6.
Let {a m } m≥0 be a sequence of constants with a m ↑ ∞ to be chosen later and let W m = {x ∈ V : logû(x) ≤ a m } be an invading sequence of V . In view of item (e) in Condition 2, W m are finite sets. Let It is easy to see that logû ≥ a m 1 W c m − C. By the exponential Chebyshev inequality, we obtain By choosing a m = 2m 2 + 2CmA m , the proof is now easily concluded from (5.3).
Corollary 5.2. Assume Conditions 1 and 2 to hold. Then the empirical entrance-current Q − t with L 1 + (V ) endowed with the strong topology is exponentially tight.
Proof. For any η > 0, it is easy to see that The result of this corollary now immediately follows from [41,Lemma 3.13].
We are now in a position to prove Theorem 3.5.
Proof of Theorem 3.5. Let (Z, τ ) = {(Z k ) k≥0 , (τ k ) k≥1 } be a Markov renewal process with the initial distribution γ Z , where Z k = (X k−1 , X k ) and X −1 can be any random variables such that γ Z ∈ P(E). Note that the empirical entrance-current for (Z, τ ) is exactly the empirical flow for (X, τ ), i.e.
Next we will apply Corollary 5.2 to (Z, τ ). It is easy to verify that (Z, τ ) satisfies Assumptions 1-4. In order to apply Corollary 5.2, we need to prove that (Z, τ ) satisfies Conditions 1 and 2. Here we only verify Condition 1 for (Z, τ ) and the proof of Condition 2 is similar.
Let the functionsû, the sequence of functions u n , the set K, and the constants c, σ, C, η, C γ be as in Condition 1 and item (c*) in Remark 3.6. By choosing u Z n (x, y) = u n (y), we immediately obtain item (b). Note that Z has the transition probability p Z (x,y),(z,w) = δ y (z)p zw . Then for any (x, y) ∈ E and n ≥ 0, we have which implies item (a). For any n ≥ 0, we have By choosing C Z γ = C γ , we obtain item (c). Note that By choosingû Z (x, y) =û(y), we obtain item (d). It is easy to check that L ZûZ (x, y) = Lû(y). Then for each ℓ ∈ R, Since (V, E) is locally finite, item (e) also holds. Note that ψ Z (x,y) = ψ y and ζ Z (x, y) = ζ(y). By choosing σ Z = σ, C Z = C, η Z = η, and K Z = (V × K) ∩ E, we obtainû Z (x, y) < ψ Z (x,y) (e ζ Z (x,y)τ ) for any (x, y) ∈ (K Z ) c and This completes the proof of item (f).

Proof of Proposition 3.12
Here we consider the marginal LDP for the empirical flow Q t . Before giving the proof of Proposition 3.12, we need some notation and lemmas. Let µ and ν be two positive σ-finite measures on a measurable space (X , F ). For any sequence of non-negative measurable functions (f i ) i≥1 on X and any sequence of non-negative constants (b i ) i≥0 s., let the generalized relative entropy between µ and ν be defined by Similarly to the maximum entropy principle [42, Theorem 12.1.1], we have the following lemma.
Lemma 6.1. Suppose that there exists a positive σ-finite measure µ * satisfying µ * ≪ ν, ν ≪ µ * , and where the sequence of constant (λ i ) i≥0 are chosen so that µ * satisfies the following constraints: Then µ * uniquely minimizes H g (·|ν) over all positive σ-finite probability measures satisfying (6.2). Moreover, the minimum is given by Proof. Let µ be a positive σ-finite measure satisfying (6.2). Note that X gdµ = X gdµ * = ∞ i=0 a i b i < ∞. Then we have where the last inequality follows from the nonnegativity of the relative entropy. It then follows from (6.1) that Hence we have proved that H g (µ|ν) ≥ H g (µ * |ν). Here the equality holds if and only if dµ/dµ * = 1. This shows that except for a set of measure zero, µ * is unique.
Recall the definition of ζ(x) in (3.1). We also need the following lemma.
We next prove that F ∈ C 1 ((−∞, ζ) is increasing. The proof of differentiability is similar to the above. Direct where the last inequality follows from Cauchy-Schwarz inequality. Moreover, the above equality holds if and only if ψ is a Dirac measure.
By Lemma 6.2, we immediately obtain the following corollary.
is a strictly increasing function, F Q = dG Q /dλ is an increasing function, and where ζ Q = sup{λ ≥ 0 : G Q (λ) < ∞}. Moreover, if m Q < M Q , then F Q is strictly increasing.
Proof. We only prove the second equality of (6.11). The proof of the first equality is similar.
Case 1: ζ Q = ∞. By the monotone convergence theorem, we immediately obtain the desired result.
then follows from the monotone convergence theorem that On the other hand, it is easy to see that Since G Q (ζ Q ) = ∞, we have F Q (ζ Q ) = ∞. This completes the proof of this corollary.
For any Q ∈ L 1 + (E) satisfying Q + = Q − and V + = ∅, let X = V + × (0, ∞) be endowed with the product topology and let F be the associated Borel σ-field. Let ν(x, ds) = sQ x ψ x (ds) be a σ-finite measure on (X , F ). Here we take the sequence of functions (f x ) x∈V+ and the sequence of constants Then the associated function g is given by g(y, s) = b 0 + x∈V+ b x f x (y, s) = 1/s and the associated generalized relative entropy is given by µ(x, ds).
Lemma 6.4. Let a > 0 be a constant and let Q ∈ L 1 + (E) be a flow satisfying Q + = Q − and V + = ∅. If G Q (ζ Q ) = ∞, then we have where µ in the infimum ranges over all positive σ-finite measures on (X , F ) satisfying Proof. We will prove this lemma in three different cases.
Case 1: m Q < a < M Q . Let λ 0 = λ * and λ x = −G x (λ * ). Let µ * (x, ds) = sQ x e λ * s+λx ψ x (ds) be a σ-finite measure µ * on (X, F ). Then (6.1) holds and Moreover, it is easy to check that µ * satisfies (6.12). By Lemma 6.1, we have On the other hand, it follows from Corollary 6. The above equality hold if and only if µ(x, ds) = Q x m x δ mx (ds) for any x ∈ V + . Thus we have On the other hand, direct computations show that .
By the dominated convergence theorem, it is easy to see that for any x ∈ V + . This implies that Case 3: a < m Q or a > M Q . Here we only consider the case of a < m Q . The proof in the case of a > M Q is similar. Note that there is no σ-finite measure µ satisfying (6.12). Then we have inf µ H g (µ|ν) = ∞. On the other hand, take a sequence of constants a x such that 0 < a x < m x for any x ∈ V + and a = x∈V+ Q x a x . Straightforward where the last equality follows from the dominated convergence theorem.
The following lemma gives the properties of G * Q .
Lemma 6.5. Let Q ∈ L 1 + (E) be a flow satisfying Q + = Q − and V + = ∅. If m Q < M Q and G Q (ζ Q ) = ∞, then G * Q ∈ C 2 (m Q , M Q ), λ * = dG * Q /da is a strictly increasing function, and Proof. It follows from Lemma 6.3 that F Q ∈ C 1 (−∞, ζ Q ) is strictly increasing. Hence λ * = dG * Q /da is the inverse function of F Q . Moreover, λ * ∈ C 1 (m Q , M Q ) is strictly increasing and (6.13) follows from (6.11). By Lemma 6.4, it is easy to see that G * Q ∈ C 2 (m Q , M Q ) and This completes the proof of this lemma.
Finally, we also need the following lemma to ensure G Q (ζ Q ) = ∞.
Proof. Let ψ, ζ, and q x be as in Condition 4. It is easy to see that where G(λ) = log ψ(e λτ ). By item (c) in Condition 4, it is easy to see that there exists y ∈ V + such that q y = inf x∈V+ q x > 0. By Lemma 6.2, it is clear that G is increasing. Then for any λ 0 < inf x∈V+ ζ(x) = q y ζ, we have By the definition of ζ Q , it is easy to see that ζ Q ≥ λ 0 . Since λ 0 < inf x∈V+ ζ(x) is arbitrary, we have ζ Q ≥ inf x∈V+ ζ(x). On the other hand, we have This implies the proof of this lemma.
We are now in a position to prove Proposition 3.12.
(i) The complement E \ W is finite; (ii) If (y, z) ∈ W , then R(y) < a; (iii) For each path exiting from x, the number of edges inÊ ∩ W is at least λ-times the total number of edges in W . In other words, for any path x 0 , x 1 , . . . , x k with x 0 = x and (x i , x i+1 ) ∈ E, we have Obviously, we have λ < 1.
Note that both Conditions 2 and 5 only depend on the embedded chain X of the semi-Markov process. The following proposition shows that Condition 2 is weaker than Condition 5. Due to this reason, we impose the former rather than the latter in the main text.
Proposition A.1. Suppose that Assumption 1 and Condition 5 are satisfied. Then Condition 2 holds.
Proof. Let the constants a 0 , λ > 0 be as in item (c) of Condition 5. For any x 0 ∈ V and m ∈ N satisfying 1/m < a 0 , we will next construct the sequence of functions u n , n ≥ m such that Condition 2 holds. We first construct auxiliary functions F n : E → (0, ∞) by induction. Let W n ⊂ E be a sequence of sets defined as It is easy to see that W n also satisfies item (c) in Condition 5 for each n ≥ m. Set Obviously, F n (x, y) ≤ n 1−λ for any (x, y) ∈ E. For any x ∈ V , let G x be the collection of all paths in (V, E) with initial state x 0 and terminal state x, i.e.
It then follows from Assumption 1 that (V, E) is connected. This implies that G x = ∅ for any x ∈ V . For each n ≥ m, let u n : V → (0, ∞) be a function defined by Now we verify Condition 2 for the sequence of functions u n .
(b) Recall the definition of auxiliary functions F n given above. For any path x = (x 0 , x 1 , · · · , x k ) ∈ G x , it follows from item (c)-(iii) in Condition 5 that n ≥ m + 1. This implies that u n is an increasing sequence of functions and u n (x) ≥ 1 for any x ∈ V and n ≥ m.
(c) For any y ∈ V , it follows from item (a) in Condition 5 that R(y) > 0. For any x ∈ V , since G x = ∅, there exists x = (x 0 , x 1 · · · , x k ) ∈ G x . Hence, we can let N = m ∨ (1 + max 0≤i≤k ⌊1/R(x i )⌋). By item (c)-(ii) in Condition 5, it is easy to see that (x i−1 , x i ) / ∈ W N for any 0 ≤ i ≤ k. Then for any n ≥ N , we have Obviously, N and k only depend on x.
(d) For any x ∈ V , let x and N be defined as in (c). Since R(x) > 1/N , it is easy to see that (x, y) / ∈ W N for any (x, y) ∈ E. Then for any n ≥ N , we have Note that u n is an increasing sequence of functions with an upper bound. Then we define u * (x) = lim n→∞ u n (x). It follows from (A.5) and the monotone convergence theorem that P u * (x) = lim n→∞ P u n (x) ≤ N 1−λ u * (x).
(e) Let V n and V ′ n be two sequences of subsets of V defined by V n = {x ∈ V : (x, y) ∈ E implies (x, y) ∈ W n for any y ∈ V } , V ′ n = x ∈ V : R(x) < 1 n , n ≥ m.
Obviously, we have V n ⊂ V ′ n . For any If (x, y) ∈ E, it is easy to see that (x, y) / ∈ W n for n ≥ i + 1. This implies that F n (x, y) = F i (x, y) for n ≥ i. Then we have u * (y) = lim n→∞ u n (y) ≤ lim n→∞ u n (x)F n (x, y) = u * (x)F i (x, y).
Note that for any (x, y) ∈ E, we have (x, y) ∈ W j . Then we have P u * (x) u * (x) ≤ y:(x,y)∈E p xy F i (x, y) = y:(x,y)∈Ê p xy F i (x, y) + y:(x,y)∈Ê c p xy F i (x, y) This means that logû(x) ≥ log(j λ /2) for any x ∈ V j . By item (c)-(i) in Condition 5, it is easy to see that is a finite set. For any ℓ > 0, select j such that log(j γ /2) > ℓ. Then {x ∈ V : logû(x) ≤ ℓ} ⊂ V \ V j is a finite set. Proof. The results follow directly from Theorem 3.5 and Proposition A.1.