Differential equation approximations for Markov chains

We formulate some simple conditions under which a Markov chain may be approximated by the solution to a differential equation, with quantifiable error probabilities. The role of a choice of coordinate functions for the Markov chain is emphasised. The general theory is illustrated in three examples: the classical stochastic epidemic, a population process model with fast and slow variables, and core-finding algorithms for large random hypergraphs.


Introduction
Differential equations and Markov chains are the basic models of dynamical systems in a deterministic and a probabilistic context, respectively.Since the analysis of differential equations is often more feasible and efficient, both from a mathematical and a computational point of view, it is of interest to understand in some generality when the sample paths of a Markov chain can be guaranteed to lie, with high probability, close to the solution of a differential equation.
We shall obtain a number of estimates, given explicitly in terms of the Markov transition rates, for the probability that a Markov chain deviates further than a given distance from the solution to a suitably chosen differential equation.The basic method is simply a combination of Gronwall's lemma with martingale inequalities.The intended contribution of this paper is to set out in a convenient form some estimates that can be deduced in this way, along with some illustrations of their use.Although it is widely understood how to arrive at a suitable differential equation, the justification of an approximation statement can be more challenging, particularly if one has cause to push beyond the scope of classical weak convergence results.We have found the use of explicit estimates effective, for example, when the Markov chain terminates abruptly on leaving some domain [4], or when convergence is needed over a long time interval [23], or for processes having a large number of components with very different scales [14].
The first step in our approach is a choice of coordinate functions for the given Markov chain: these are used to rescale the process, whose values might typically form a vector of non-negative integers, to one which may lie close to a continuously evolving path.The choice of coordinate functions may also be used to forget some components of the Markov chain which do not behave suitably and further, as is sometimes necessary, to correct the values of the remaining components to take account of the values of the forgotten components.This is illustrated in the examples in Sections 6 and 7.The behaviour of forgotten components can sometimes be approximated by a random process having relatively simple characteristics, which are determined by the differential equation.This is illustrated in the example in Section 5, where it is used to show the asymptotic independence of individuals in a large population.
We have been motivated by two main areas of application.The first is to population processes, encompassing epidemic models, queueing and network models, and models for chemical reactions.It is often found in models of interest that certain variables oscillate rapidly and randomly while others, suitably rescaled, are close to deterministic.It was a primary motivation to find an extension of our quantitative estimates which was useful in such a context.The example in Section 6, which is drawn from [2], shows that this is possible.The second area of application is the analysis of randomized algorithms and combinatorial structures.Here, the use of differential equation approximations has become an important tool.The example in Section 7 gives an alternative treatment and generalization of the k-core asymptotics discovered in [19].
The martingale estimates we need are derived from scratch in the Appendix, using a general procedure for the identification of martingales associated to a Markov chain.We have taken the opportunity to give a justification of this procedure, starting from a presentation of the chain in terms of its jump chain and holding times.We found it interesting to do this without passing through the characterization of Markov chains in terms of semigroups and generators.
The authors are grateful to Perla Sousi and to a referee for a careful reading of an earlier version of this paper, which has helped to clarify the present work.

Survey of related literature
There is a well-developed body of literature devoted to the general question of the convergence of Markov processes, which includes as a special case the question we address in this paper.This special case arises under fluid limit or law of large numbers scaling, where, for large N , jumps whose size is of order 1/N occur at a rate of order N .This is to be distinguished from diffusive or central limit scaling, where jumps of mean zero and of size of order 1/ √ N occur at a rate of order N .Just as in the classical central limit theorem, a Gaussian diffusive limit can be used to describe to first order the fluctuations of a process around its fluid limit.
Both sorts of limit are presented in the books by Ethier and Kurtz [6, Section 7.4], Jacod and Shiryaev [8, Section IX.4b], and Kallenberg [9].These works develop conditions on the transition operators or rate kernels of a sequence of Markov chains which are sufficient to imply the weak convergence of the corresponding processes.Trotter's paper [22] was one of the first to take this point of view.
The fluid limit is more elementary and often allows, with advantage, a more direct approach.One identifies a limiting drift b of the processes, which we shall suppose to be a Lipschitz vector field, and then the limit is the deterministic path obtained by solving the differential equation ẋ = b(x).Kurtz [12] describes some sufficient conditions for weak convergence of processes in this context.Since the limit is continuous in time, weak convergence is here simply convergence in probability to 0 of the maximal deviation from the limit path over any given compact time interval.Later, exponential martingale estimates, were used to prove decay of error probabilities at an exponential rate.See the book of Shwartz and Weiss [21].This is the direction also of the present paper.Differential equation approximations for stochastic systems with small noise have been studied for many sorts of process other than Markov processes.See the book of Kushner and Yin [13].
Applications of fluid limits for Markov chains are scattered across many fields.See [3] on epidemiology and [21] on communications and computer networks.Much has been achieved by the identification of deterministic limit behaviour when randomized algorithms are applied to large combinatorial problems, or deterministic algorithms are applied to large random combinatorial structures.Examples include Karp and Sipser's seminal paper [10] on maximal matchings, Hajek's analysis [7] of communications protocols, Mitzenmacher's [16] balanced allocations, and the analysis of Boolean satisfiability by Achlioptas [1] and Semerjian and Monasson [18].A general framework for this sort of application was developed by Wormald and others, see [19], [24].
Finally, the emergence of deterministic macroscopic evolutions from microsopic behaviour, often assumed stochastic, is a more general phenomenon than addressed in the literature mentioned above.We have only considered scaling the sizes of the components of a Markov chain.In random models where each component counts the number of particles at a given spatial location, it is natural to scale also these locations, leading sometimes to macroscopic laws governed by partial rather than ordinary differential equations.This is the field of hydrodynamic limits -see, for example, Kipnis and Landim [11], for an introduction.

Some simple motivating examples
We now give a series of examples of Markov processes, each of which takes many small jumps at a fast rate.The drift is the product of the average jump by the total rate, which may vary from state to state.In cases where there are a number of different types of jump, one can compute the drift as a sum over types of the size of the jump multiplied by its rate.We write down the drift and hence obtain a differential equation.In the rest of the paper, we give conditions under which the Markov chain will be well approximated by solutions of this equation.In each of the examples there is a parameter N which quantifies the smallness of the jumps and the compensating largeness of the jump rates.The approximations will be good when N is large.

Poisson process
Take (X t ) t 0 to be a Poisson process of rate λN , and set X t = X t /N .Note that X takes jumps of size 1/N at rate λN .The drift is then λ and the differential equation is ẋt = λ.
If we take as initial state X 0 = x 0 = 0, then we may expect that X t stay close to the solution x t = λt.This is a law of large numbers for the Poisson process.
Consider a queue with arrivals at rate N , exponential service times of mean 1, and infinitely many servers.Write X t for the number of customers present at time t.Set X t = X t /N , then X is a Markov chain, which jumps by 1/N at rate N , and jumps by −1/N at rate N X t .The drift is then 1 − x and the differential equation is ẋt The solution of this equation is given by x t = 1 + x 0 e −t , so we may expect that, for large N the queue size stabilizes near N , at exponential rate 1.

Chemical reaction A + B ↔ C
In a reversible reaction, pairs of molecules of types A and B become a single molecule of type C at rate λ/N , and molecules of type C become a pair of molecules of types A and B at rate µ.Write A t , B t , C t for the numbers of molecules of each type present at time t.Set then X is a Markov chain, which makes jumps of (−1, −1, 1)/N at rate (λ/N )(N X 1 t )(N X 2 t ), and makes jumps of (1, 1, −1)/N at rate µ(N X 3 t ).The drift is then (µx 3 − λx 1 x 2 , µx 3 − λx 1 x 2 , λx 1 x 2 − µx 3 ) and the differential equation is, in components, Any vector (x 1 , x 2 , x 3 ) with µx 3 = λx 1 x 2 is a fixed point of this equation and may be expected to correspond to an equilibrium state of the system.

Gunfight
Two gangs of gunmen fire at each other.On each side, each surviving gunman hits one of the opposing gang randomly, at rate α for gang A and at rate β for gang B. Write A t and B t for the numbers still firing on each side at time t.Set X t = (X1 t , X 2 t ) = (A t , B t )/N , then X is a Markov chain and jumps by (0, −1)/N at rate αN X 1 t , and by (−1, 0)/N at rate βN X 2 t .The drift is then (−βx 2 , −αx 1 ) and the differential equation is, in components, Note that, in this case the parameter N does not enter the description of the model.However the theory will give an informative approximation only for initial conditions of the type (A t , B t ) = N (a 0 , b 0 ).The reader may like to solve the equation and see who wins the fight.

Continuous time branching processes
Each individual in a population lives for an exponentially distributed time of mean 1/N , whereupon it is replaced by a random number Z of identical offspring, where Z has finite mean µ.Distinct individuals behave independently.Write X t for the number of individuals present at time t.Set X t = X t /N , then X is a Markov chain, which jumps by (k − 1)/N at rate This equation gives a first order approximation for the evolution of the population size -in particular, it is clear that the cases where µ < 1, µ = 1, µ > 1 should show very different long-time behaviour.

Derivation of the estimates
Let X = (X t ) t 0 be a continuous-time Markov chain with countable 1 statespace S. Assume that in every state ξ ∈ S the total jump rate q(ξ) is finite, and write q(ξ, ξ ′ ) for the jump rate from ξ to ξ ′ , for each pair of distinct states ξ and ξ ′ .We assume that X does not explode: a simple sufficient condition for this is that the jump rates are bounded, another is that X is recurrent.We make a choice of coordinate functions x i : S → R, for i = 1, . . ., d, and write x = (x 1 , . . ., x d ) : S → R d .Consider the R d -valued process X = (X t ) t 0 given by X t = (X 1 t , . . ., X d t ) = x(X t ).Define, for each ξ ∈ S, the drift vector where we set β(ξ) = ∞ if this sum fails to converge absolutely.
Our main goal is the derivation of explicit estimates which may allow the approximation of X by the solution of a differential equation.We shall also discuss how the computation of certain associated probabilities can be simplified when such an approximation is possible.
Let U be a subset of R d and let x 0 ∈ U .Let b : U → R d be a Lipschitz vector field.The differential equation ẋt = b(x t ) has a unique maximal solution (x t ) t ζ , starting from x 0 , with x t ∈ U for all t < ζ.Maximal here refers to ζ and means that there is no solution in U defined on a longer time interval.Our analysis is based on a comparison of the equations where T 1 = inf{t 0 : β(X t ) = ∞} and where the first equation serves to define the process (M t ) 0 t T1 .

L 2 -estimates
The simplest estimate we shall give is obtained by a combination of Doob's L2 -inequality and Gronwall's lemma.Doob's L 2 -inequality states that, for any martingale (M t ) t t0 , Gronwall's lemma states that, for any real-valued integrable function f on the interval [0, t 0 ], the inequality implies that f (t 0 ) Ce Dt0 .Write, for now, K for the Lipschitz constant of b on U with respect to the Euclidean norm |.|.Fix t 0 < ζ and ε > 0 and assume that 2 , for all ξ ∈ S and t t 0 , |x(ξ) − Set δ = εe −Kt0 /3 and fix A > 0. For our estimate to be useful it will be necessary that A be small compared to ε 2 .Set T = inf{t 0 : The unit square (0, 1) 2 is a possible choice of the set U .The inner and outer solid curves bound a tube around the deterministic solution (red) of the ordinary differential equation, which starts inside U .The realization shown of the Markov chain trajectory does not leave the tube before exit from U .This is a realization of the stochastic epidemic, which will be discussed in more detail in Section 5.
Consider the events3 and Consider the random function ε, which implies T > t 0 and hence f (t 0 ) ε. Consider now the stopping time By Cauchy-Schwarz, we have |β(ξ)| 2 q(ξ)α(ξ) for all ξ ∈ S, so T T 1 .By a standard argument using Doob's L 2 -inequality, which is recalled in Proposition 8.7, we have On Ω 2 , we have T = T ∧ t 0 , so Ω 2 \ Ω ′ 2 ⊆ {sup t T |M t | > δ} and so, by Chebyshev's inequality, P(Ω 2 \ Ω ′ 2 ) 4At 0 /δ 2 .We have proved the following result, which can sometimes enable us to show that the situation illustrated in Figure 1 occurs with high probability.

Exponential estimates
It is clear that the preceding argument could be applied for any norm on R d with obvious modifications.We shall do this for the supremum norm x = max i |x i |, making at the same time a second variation in replacing the use of Doob's L 2inequality with an exponential martingale inequality.This leads to the version of the result which we prefer for the applications we have considered.It will be necessary to modify some assumptions and notation introduced in the preceding subsection.We shall stick to these modifications from now on.We assume now that, ε > 0 and t 0 are chosen so that, for all ξ ∈ S and t t 0 , Write now K for the Lipschitz constant of b with respect to the supremum norm.Set δ = εe −Kt0 /3.Fix A > 0 and set θ = δ/(At 0 ).Define and set Consider the events For the second inequality, we used a standard exponential martingale inequality, which is recalled in Proposition 8.8.Since the same argument applies also to −M and for all i, we thus obtain P(Ω 2 \Ω ′ 2 ) 2de −δ2 /(2At0) .We have proved the following estimate, which is often stronger than Theorem 4.1.In an asymptotic regime where the sizes of jumps in X are of order 1/N but their rates are of order N , the estimate will often allow us to prove decay of error probabilities in the differential equation approximation at a rate exponential in N .The price to be paid for this improvement is the necessity to deal with the event Ω 2 just defined rather than its more straightforward counterpart in the preceding subsection4 .Theorem 4.2.Under the above conditions,

Convergence of terminal values
In cases where the solution of the differential equation leaves U in a finite time, so that ζ < ∞, we can adapt the argument to obtain estimates on the time T that X leaves U and on the terminal value X T .The vector field b can be extended to the whole of R d with the same Lipschitz constant.Let us choose such an extension, also denoted b, and write now (x t ) t 0 for the unique solution to ẋt = b(x t ) starting from x 0 .Define for ε > 0 Typically we will have ρ(ε) → 0 as ε → 0. Indeed, if U has a smooth boundary at x ζ , and if b(x ζ ) is not tangent to this boundary, then ρ(ε) Cε for some C < ∞, for all sufficiently small ε > 0. However, we leave this step until we consider specific examples.Assume now, in place of (2), that ε and t 0 are chosen so that We have proved the following estimate 6 .Theorem 4.3.Under the above conditions,

Random processes modulated by the fluid limit
We return now to the case where t 0 < ζ and condition (2) holds.Although the results given so far can be interpreted as saying that X is close to deterministic, there are sometimes associated random quantities which we may wish to understand, and whose behaviour can be described, approximately, in a relatively simple way in terms of the deterministic path (x t ) t t0 .To consider this in some generality, suppose there is given a countable set I and a function y : S → I and consider the process Y = (Y t ) t 0 given by Y t = y(X t ).Define, for ξ ∈ S and y ∈ I with y = y(ξ), the jump rates We now give conditions which may allow us to approximate Y by a Markov chain with time-dependent jump rates, which are given in terms of the path (x t ) t t0 and a non-negative function g on U × {(y, y ′ ) ∈ I × I : y = y ′ }.Set g t (y, y ′ ) = g(x t , y, y ′ ) for t t 0 .Fix I 0 ⊆ I and set 5 The function ρ depends on the choice of extension made of b outside U , whereas the distribution of X T − x ζ does not.This is untidy, but it is not simple to optimise over Lipschitz extensions, and in any case, this undesirable dependence of ρ is a second order effect as ε → 0.
6 The same argument can be made using, in place of This refinement can be useful if we wish to start X on the boundary of U .
Set T 0 = inf{t 0 : X t ∈ U or Y t ∈ I 0 }, fix G > 0, and define Theorem 4.4.There exists a time-inhomogeneous Markov chain (y t ) t t0 , with state-space I and jump rates g t (y, y ′ ), such that where τ = inf{t 0 : Proof.We construct the process (X t , y t ) t t0 as a Markov chain, where the rates are chosen to keep the processes (Y t ) t t0 and (y t ) t t0 together for as long as possible.Define for t t 0 , and for ξ, ξ ′ ∈ S and y, y ′ ∈ I, with (ξ, y) = (ξ ′ , y ′ ), first in the case y = y(ξ), then in the case y = y(ξ), Consider the Markov chain (X t , y t ) t t0 on S ×I, starting from (X 0 , y(X 0 )), with jump rates q t (ξ, y; ξ ′ , y ′ ).It is straightforward to check, by calculation of the marginal jump rates, that the components (X t ) t t0 and (y t ) t t0 are themselves Markov chains, having jump rates q(ξ, ξ ′ ) and g t (y, y ′ ) respectively.Set then T0 > 0 and, for t < t 0 , the hazard rate for T0 is given by ρ Thus, there is an exponential random variable E of parameter 1 such that, on , which combines with our earlier estimates to give the desired result.
There are at least two places where the basic argument used throughout this section is wasteful and where, with extra effort, better estimates could be obtained.First, we have treated the coordinate functions symmetrically; it may be that a rescaling of some coordinate functions would have the effect of equalizing the noise in each direction.This will tend to improve the estimates.Second, Gronwall's lemma is a blunt instrument.A better idea of how the perturbations introduced by the noise actually propagate is provided by differentiating the solution flow to the differential equation.Sometimes it is possible to show that, rather than growing exponentially, the effect of perturbations actually decays with time.These refinements are particularly relevant, respectively, when the dimension d is large, and when the time horizon t 0 is large.We do not pursue them further here.

Stochastic epidemic
We discuss this well known model, see for example [3], to show in a simple context how the estimates of the preceding section lead quickly to useful asymptotic results.The stochastic epidemic in a population of size N is a Markov chain X = (X t ) t 0 whose state-space S is the set of pairs ξ = (ξ 1 , ξ 2 ) of non-negative integers with ξ 1 + ξ 2 N .The non-zero jump rates, for distinct ξ, ξ ′ ∈ S, are given by Here λ and µ are positive parameters, having the interpretation of infection and removal rates, respectively.Write X t = (ξ 1 t , ξ 2 t ).Then ξ 1 t represents the number of susceptible individuals at time t and ξ 2 t the number of infective individuals.Suppose that initially a proportion p ∈ (0, 1) of the population is infective, the rest being susceptible.Thus X 0 = (N (1 − p), N p).The choice of jump rates arises from the modelling assumption that each susceptible individual encounters randomly other members of the population, according to a Poisson process and becomes infective on first meeting an infective individual; then infectives are removed at an exponential rate µ.By a linear change of timescale we can reduce to the case µ = 1, so we shall assume that µ = 1 from this point on.

Convergence to a limit differential equation
Define x : S → R 2 by x(ξ) = ξ/N and set X t = x(X t ).Then the drift vector is given by β t , has a unique solution (x t ) t 0 , starting from x 0 , which stays in U for all time.Note that x(S) ⊆ U , so condition (2) holds for any ε > 0 and t 0 .The Lipschitz constant for b on U is given by K = λ + λ ∨ 1. Set A = (1 + λ)e/N and take δ = e −Kt0 ε/3 and θ = δ/(At 0 ), as in Section 4. Let us assume that ε t 0 , then θ N , so σ θ (1/N ) 1  2 (θ/N ) 2 e (as in Footnote 4) and so Hence, in this example, Ω 0 = Ω 1 = Ω 2 = Ω and from Theorem 4.2 we obtain the estimate where C = 18(λ + 1)t 0 e 2Kt0+1 .Figure 2 illustrates a realization of the process alongside the solution of the differential equation.

Convergence of the terminal value
The estimate just obtained, whilst giving strong control of error probabilities as N becomes large, behaves rather poorly as a function of t 0 .This is because we have used the crude device of Gronwall's lemma rather than paying closer attention to the stability properties of the differential equation ẋt = b(x t ).In particular, the estimate is useless if we want to predict the final size of the epidemic, that is to say, the proportion of the population which is eventually infected, given by X 3 ∞ = lim t→∞ X 3 t , where X 3 t = 1−X 1 t −X 2 t .However, we can obtain an estimate on X 3 ∞ by the following modified approach.Let us change the non-zero jump rates by setting q(ξ, ξ ′ ) = q(ξ, ξ ′ )/ξ 2 , for ξ = (ξ 1 , ξ 2 ), to obtain a new process ( Xt ) t 0 .Since we have changed only the time-scale, the final values X 3 ∞ and X3 ∞ have the same distribution.We can now re-run the analysis, just done for X, to X.Using obvious notation, we have b(x) = (−λx 1 , λx 1 − 1) and φ(ξ, θ) = σ θ (1/N )(λξ 1 /N + 1).We now take U = (0, 1] 2 .The Lipschitz constant K is unchanged.We make the obvious extension of b to R 2 .By explicit solution of the differential equation, we see that (x t ) t 0 leaves U at time τ , with , where τ is the unique root of the equation is not tangent to the boundary, and so ε + ρ(ε) Cε for all ε ∈ (0, 1] for some C < ∞ depending only on λ and p.We can therefore choose t 0 > τ and apply Theorem 4.3 to obtain, for a constant C < ∞ of the same dependence, for all ε ∈ (0, 1] and all N ,

Limiting behaviour of individuals
We finally give an alternative analysis which yields a more detailed picture.Consider a Markov chain X = ( Xt ) t 0 with state-space S consisting of N -vectors η = (η 1 , . . ., η N ) with η j ∈ {1, 2, 3} for all j.Each component of η represents the state of an individual member of the population, state 1 corresponding to susceptible, state 2 to infective, and 3 to removed.The non-zero jump rates, for distinct η, η ′ ∈ S, are given by q(η, η ′ ) = λξ 2 (η)/N, if η ′ = η + e j for some j with η j = 1, 1, if η ′ = η + e j for some j with η j = 2. Here , and e j = e (N ) j = (0, . . ., 1, . . ., 0) is the elementary N -vector with a 1 in the jth position.Set X t = ξ( Xt ).Then X = (X t ) t 0 is the stochastic epidemic considered above.Define x : S → R 2 by x(η) = x(ξ(η)).Then x( Xt ) = x(X t ) = X t , which we already know to remain close to x t with high probability when N is large.
We can now describe the limiting behaviour of individual members of the population.Fix k ∈ {1, . . ., N } and set I = {1, 2, 3} k .Define y : S → I by y(η) = (η 1 , . . ., η k ) and set Y t = y( Xt ).We seek to apply Theorem 4.4.Define for x ∈ U and n, n ′ ∈ {1, 2, 3} if n = 2 and n ′ = 3, 0, otherwise, and, for y, y ′ ∈ I, set g(x, y, y ′ ) = k j=1 g 0 (x, y j , y ′ j ).Then the jump rates for Y are given by γ(η, y) = g(x(η), y(η), y), so we can take G = 0 and Ω 3 = Ω, and it is straightforward to check that, if I 0 = I, then κ = kλε.Hence there is a time-inhomogeneous Markov chain (y t ) t 0 with state-space I and jump rates g t (y, y ′ ) = g(x t , y, y ′ ), y, y ′ ∈ I, such that A roughly optimal choice of ε is C log N/N , giving a constant C ′ < ∞, depending only on λ and t 0 , such that for all sufficiently large N .Note that the components of (y t ) t 0 are independent.

Population processes
The modelling of population dynamics, involving a number of interacting species, is an important application of Markov chains.A simple example of this was already discussed in Section 5. We propose now to consider another example, of a model which has been used for the growth of a virus in a cell.Our primary aim here is to show how to deal with a Markov chain where some components, the slow variables, can be approximated by the solution to a differential equation but others, the fast variables, instead oscillate rapidly and randomly.Specifically, by a non-standard choice of coordinate functions, we can obtain an approximation for the slow variables, with computable error probabilities.
A population process is a Markov chain X = (X t ) t 0 , where the state X t = (ξ 1 t , . . ., ξ n t ) describes the number of individuals in each of n species at time t; the dynamics are specified by a choice of rates λ ε,ε ′ for each of the possible reactions (ε, ε ′ ), where ε, ε ′ ∈ (Z + ) n ; then, independently over reactions, X makes jumps of size ε ′ − ε at rate The sort of analysis done below can be adapted to many other population processes.

Analysis of a model for viral replication and growth
We learned of this model from the paper [2], which contains further references on the scientific background.There are three species G, T and P which represent, respectively, the genome, template, and structural protein of a virus.We denote by ξ 1 , ξ 2 , ξ 3 the respective numbers of molecules of each type.There are six reactions, forming a process which may lead from a single virus genome to a sustained population of all three species and to the production of the virus.We write the reactions as follows: Here, α > 1, R 1, N 1 and λ, µ, ν > 0 are given parameters and, for example, the third reaction corresponds, in the general notation used above, to the case ε = (0, 1, 0) and ε ′ = (1, 1, 0), with λ ε,ε ′ = R, whereas the final reaction, which causes jumps of size (−1, 0, −1), occurs at a total rate of νξ 1 ξ 3 /N .We have omitted some scientific details which are irrelevant to the mathematics, and have written ∅ when the reaction produces none of the three species in the model.In fact it is the final reaction G + P which gives rise to the virus itself.In the case of scientific interest, α, λ, µ, ν are of order 1, but R, N are large.We therefore seek an approximation which is good in this regime.
As a first step to understanding this process, we note that, for as long as the number of templates ξ 2 t remains of order 1, the rate of production of genomes is of order R. On the other hand, for as long as the number of genomes ξ 1 t is bounded by xR, for some x > 0, the number of templates can be dominated7 by a M/M/∞ queue, (Y t ) t 0 with arrival rate λxR and service rate R/α.The stationary distribution for (Y t ) t 0 is Poisson of parameter λxα, which suggests that, for reasonable initial conditions at least, ξ 2 t does remain of order 1, but oscillates rapidly, on a time-scale of order 1/R.The number of proteins ξ 3 t evolves as an M/M/∞ queue, with time-dependent arrival rate RN ξ 2 t and service rate R/µ+νξ 1 t /N .This suggests that ξ 3 t /N will track closely a function of the rapidly oscillating process ξ 2 t .The only hope for a differential equation approximation would thus appear to be the genome process (ξ 1 t ) t 0 .The obvious choice of coordinate map x(ξ) = ξ 1 /R gives as drift which we cannot approximate by a function of x(ξ) unless the second and third terms become negligible.In fact they do not, so this choice fails.The problem is that the drift of ξ 1 t is significantly dependent on the fast variables ξ 2 t and ξ 3 t .To overcome this, we can attempt to compensate the coordinate process so that it takes account of this effect.We seek to find a function x on the state-space S = (Z + ) 3 of the form where χ is a small correction, chosen so that the drift vector β(ξ) has the form where again ∆(ξ)/R is small when R is large.Small here refers to a typical evaluation on the process, where we recall that we expect ξ 1 t /R, ξ 2 t and ξ 3 t /N to be of order 1.It is reasonable to search for a function χ which is affine in ξ 1 and linear in (ξ 2 , ξ 3 ).After some straightforward calculations, we find that has the desired property, with The limit differential equation ẋt = λ(α − 1)x t − λαµνx 2 t has a unique positive fixed point x ∞ = (α − 1)/(αµν).Fix x 0 ∈ [0, x ∞ ] and take as initial state X 0 = (Rx 0 , 0, 0).Theorem 6.1.For all t 0 ∈ [1, ∞), there is a constant C < ∞, depending only on α, λ, µ, ν, t 0 with the following property.For all ε ∈ (0, 1] there is a constant R 0 < ∞, depending only on α, λ, µ, ν, t 0 and ε such that, for all R R 0 and N R, we have Proof.We shall write C for a finite constant depending only on α, λ, µ, ν, t 0 , whose value may vary from line to line, adding a subscript when we wish to refer to a particular value at a later point.Fix constants a 1, γ > 0, Γ 1, with (α + 1)(µν + 1)γ 1/2, to be determined later, and set A = a/R.Take U = [0, x ∞ + 1].As in Section 4, let us write K for the Lipschitz constant of b on U , and set X t = x(X t ), δ = εe −Kt0 /3, θ = δ/(At 0 ) and T = inf{t 0 : X t ∈ U }. Since 0 x t x ∞ for all t and since ε 1, condition (2) holds.
From equation ( 4) we obtain |ξ 1 t /R − X t | C 3 γ.Let us choose then a, γ, Γ and R so that C 0 γ δ, C 1 (γΓ + 1/R) δ, a C 2 Γe and C 3 γ ε.Then On the other hand, Ω 4 ∩ Ω 5 ⊆ Ω 0 ∩ Ω 1 ∩ Ω 2 and, by Theorem 4.2, we have , where C = 18t 0 e 2Kt0 , we can now complete the proof by showing that, for suitable a, γ, Γ and R 0 , for all R R 0 , we have P(Ω c 4 ) e −R and P(Ω c 5 ) e −R .We can dominate the processes (ξ 2 t ) t 0 and (ξ 2 t ) t 0 , up to T , by a pair of processes Y = (Y t ) t 0 and Z = (Z t ) t 0 , respectively, where Y is an M/M/∞ queue with arrival rate 2λ(x ∞ + 1)R and service rate R/α, starting from ξ 2 0 = 0, and where, conditional on Y , Z is an M/M/∞ queue with arrival rate RN Y t and service rate R/µ, starting from ξ 3 0 = 0. We now use the estimates ( 5) and ( 6), to be derived in the next subsection.For Γ sufficiently large, using the estimate (6), we have P(Ω c 5 ) e −R for all sufficiently large R. Fix such a Γ and choose a, R sufficiently large and γ sufficiently small to satisfy the above constraints.Finally, using the estimate (5), P(Ω c 4 ) e −R , for all sufficiently large R. The initial state (Rx 0 , 0, 0) was chosen to simplify the presentation and is not realistic.However, an examination of the proof shows that, for some constant γ > 0, depending only on α, λ, µ, ν, ε, the same conclusion can be drawn for any initial state (Rx 0 , ξ 2 0 , ξ 3 0 ) with x 0 x ∞ , ξ 2 0 Rγ and ξ 3 0 RN γ.Since typical values of the fast variables ξ 2 t and ξ 3 t are of order 1 and N respectively, this is more realistic.Although we are free to take an initial state (1, 0, 0), the action of interest in this case occurs at a time of order log R, so is not covered by our result.Instead, there is a branching process approximation for the number of genomes, valid until it reaches Rx 0 , for small x 0 .Our estimate can be applied to the evolution of the process from that time on.See [2] for more details of the branching process approximation.

Some estimates for the M/M/∞ queue
We now derive the fast variable bounds used in the proof of Theorem 6.1.They are based on the following two estimates for the M/M/∞ queue.Proposition 6.2.Let (X t ) t 0 be an M/M/∞ queue starting from x 0 , with arrival rate λ and service rate µ.Then, for all t 1/µ and all a 3λe 2 /µ, P sup 0 s t X s x 0 + log(µt) + a exp −a log µa 3λe .
Proof.By rescaling time, we reduce to the case where µ = 1.Also, we have X t x 0 +Y t , where (Y t ) t 0 is an M/M/∞ queue starting from 0, with the same arrival rate and service rate.Thus we are reduced to the case where x 0 = 0.
Choose n ∈ N so that δ = t/n ∈ [1, 2) and note that, for k = 0, 1, . . ., n − 1, we have sup where A k+1 is the number of arrivals in (kδ, (k + 1)δ] and where Y k is a Poisson random variable of parameter λ, independent of A k+1 .By the usual Poisson tail estimate 8 , for all x 0, Hence, for t 1 and a 3λe where S n is the service time of the nth customer present at time 0. The result follows by an elementary calculation.
Consider next the case where x 0 = 0. We can express X t in terms of a Poisson random measure m on [0, ∞) × [0, 1] of intensity λ t dtdu, thus Then, by Fubini, The result for x 0 = 0 follows.The general case now follows by independence.Now let Y = (Y t ) t 0 be an M/M/∞ queue with arrival rate λR and service rate R/α, starting from Ry, and, conditional on Y , let (Z t ) t 0 be an M/M/∞ queue with arrival rate RN Y t and service rate R/µ, starting from RN z.By Proposition 6.2, for any t 0 0 and γ > 0, we can find R 0 < ∞, depending only on α, γ, λ, µ and t 0 , such that, for all R 0 R and N 1, On the other hand, using Proposition 6.3, for θα 1/2, for a constant C depending on α, λ, t 0 .Then, by conditioning first on Y , we obtain, for all θ(µ + 1)α 1/2, where C depends on α, λ, µ, t 0 .Thus, we obtain constants Γ, R 0 < ∞, depending only on α, λ, µ, t 0 , such that, for all R R 0 and N 1,

Hypergraph cores
The approximation of Markov chains by differential equations is a powerful tool in probabilistic combinatorics, and in particular in the asymptotic analysis of structures within large random graphs and hypergraphs.It is sometimes possible to find an algorithm, whose progress can be described in terms of a Markov chain, and whose terminal value gives information about the structure of interest.If this Markov chain can be approximated by a differential equation, then this may provide an effective means of computation.We shall describe in detail an implementation of this approach which yields a quantitative description of the k-core for a general class of random hypergraphs.Here k 2 is an integer, which will remain fixed throughout.

Specification of the problem
Let V and E be finite sets.A hypergraph with vertex set V and edge-label set E is a subset γ of V × E. Given a hypergraph γ, define, for v ∈ V and e ∈ E, sets γ(v) = γ ∩ ({v} × E) and γ(e) = γ ∩ (V × {e}).The sets γ(e) are the (hyper)edges of the hypergraph γ.
Thus, if we call a sub-hypergraph of γ any hypergraph obtained by deleting edges from γ, then γ is the largest sub-hypergraph of γ in which every vertex of non-zero degree has degree at least k.It is not hard to see that any algorithm which deletes recursively edges containing at least one vertex of degree less than k terminates at the k-core γ.The k-core is of interest because it is a measure of the strength of connectivity present in γ; see [17], [19], [20].
A frequency vector is a vector n = (n d : The frequency vectors of a hypergraph γ are then the pair p(γ), q(γ), where p(γ) = n(d γ ) and q(γ) = n(w γ ).Note that m(p(γ)) is simply the cardinality of γ, as of course is m(q(γ)).
The datum for our model is a pair of non-zero frequency vectors p, q with m(p) = m(q) = m < ∞.Note that there exists an integer L 2 such that p d = q w = 0 for all d, w L + 1.We assume also that p 0 = q 0 = 0.This will result in no essential loss of generality.Fix an integer N 1.We shall be interested in the limit as N → ∞.Choose sets V and E and functions d : V → Z + and w : E → Z + such that n(d) = N p and n(w) = N q.In particular, this implies that |V | = N d p d and |E| = N w q w .Denote by G(d, w) the set of hypergraphs on V × E with degree function d and weight function w.Thus and, in particular, all elements of G(d, w) have cardinality N m.This set is known to be non-empty for N sufficiently large.Its elements can also be thought of as bipartite graphs on V ∪ E with given degrees.We shall be interested in the distribution of the k-core Γ when Γ is a hypergraph chosen uniformly at random from G(d, w).We write Γ ∼ U (d, w) for short.Set D = dΓ and W = wΓ.These are the degree and weight functions of the k-core.Define for d, d ′ , w 0 (7) Note that, given ( Pd,d ′ : k d d ′ ) and ( Qw : w 1), we can recover the other independent and, with the exception of ṽ, having distribution λ.Here λ and σ are given by λ d = (d + 1)p d+1 /m, σ w = (w + 1)q w+1 /m, d, w 0. ( For i = 1, . . ., D, write Si for the number of offspring of ẽi and, for j = 1, . . ., Si , write Li,j for the number of offspring of the jth offspring of ẽi .Then, conditional on D = d, the random variables S1 , . . ., Sd are independent, of distribution σ, and, further conditioning on Si = s i for i = 1, . . ., d, the random variables Li,j , i = 1, . . ., d, j = 1, . . ., s i , are independent, of distribution λ. It is known (see [15] or, for a more explicit statement, [5]) that there is a function ψ 0 : N → [0, 1], depending only on L, with ψ 0 (m) → 0 as m → ∞, such that, for all degree and weight functions d, w L with m(d) = m(w) = m, we have P(A) 1−ψ 0 (m) and there is a coupling of Γ and T such that D = D and, with probability exceeding 1 − ψ 0 (m), we have S i = Si for all i and L i,j = Li,j for all i, j.The left picture shows a hypergraph with eight vertices, three 2-edges, and three 3edges.An incidence is selected at random, shown by the enlarged vertex, and chosen as root of a branching process, shown as the bottom vertex on the right.The root has two hyperedge offspring, shown as grey squares.One of these has two vertex offspring, and so on.
The following paragraph presents a heuristic argument which leads quickly to a prediction for the asymptotic frequencies of core degrees and weights, which we shall later verify rigorously, subject to an additional condition.The convergence of Γ to T , near a randomly chosen vertex, which we expressed in terms of the function ψ 0 for the first two steps, in fact holds in a similar sense for any given numbers of steps.The algorithm of deleting, recursively, all edges in Γ containing any vertex of degree less than k terminates at the k-core Γ.Consider the following analogous algorithm on the branching process: we remove in the first step all individuals of type E having some offspring with fewer than k − 1 offspring of its own; then repeat this step infinitely often.Set g 0 = 1.For n 0, write s n for the probability that, after n steps, a given individual of type E remains in the population, and write g n+1 for the probability that, after n steps, a given individual of type V (distinct from ṽ) has at least k − 1 offspring remaining.Then, by a standard type of branching process argument, We write here σ, and below λ, for the probability generating functions So g n+1 = φ(g n ), where Note that, in the case k = 2, we have the simple formula Since φ maps [0, 1] continuously to [0, 1) and is increasing, as may be verified by differentiation, the equation φ(g) = g has a root in [0, 1) and g n converges to the largest such root g * as n → ∞.See Figure 4. Suppose that we accept the branching process as a suitable approximation to the hypergraph for the calculation of the core.Then we are led to the following values for the limiting core frequencies: We have not justified the interchange of limits which would be required to turn this into a rigorous argument.This seems unlikely to be straightforward.For, by analogy with Theorem 2.2 in [4], in the critical case when φ(g) g in a neighbourhood of g * , we would expect that, asymptotically, the core frequencies would take values corresponding to smaller roots of φ(g) = g with probability 1/2.Thus, in this case, when also the crossing condition of Theorem 7.1 fails, the branching process heuristic would lead to an incorrect conclusion.However, for certain random graphs, this sort of approach was made to work in [20].

Statement of result
We return to the framework described in Subsection 7.1.Thus now n(d) = N p and n(w) = N q for our given frequency vectors p and q.Define the distributions λ and σ by the equations ( 8) 10 .The normalized core frequencies Pd,d ′ and Qw were defined at (7) and the limiting core frequencies pd,d ′ and qw were defined at (11).The following result will be proved in Subsection 7.7 using a differential equation approximation to a suitably chosen Markov chain.
Theorem 7.1.Assume that either g * = 0 or the following crossing condition holds: Then, for all ν ∈ (0, 1], there is a constant C < ∞, depending only on p, q and ν, such that, for all N 1,

Splitting property
A uniform random hypergraph Γ ∼ U (d, w) has a useful splitting property, which we now describe.Given a partition V = V ′ ∪ V ′′ , we can identify a hypergraph h on V × E with the pair of hypergraphs h ′ , h ′′ on V ′ × E, V ′′ × E respectively, obtained by intersection.Consider the partition where d ′ , d ′′ are the restrictions of d to V ′ , V ′′ respectively, and where w ′ , w ′′ range over all weight functions on E subject to the given constraint.We deduce that, conditional on {W ′ = w ′ , W ′′ = w ′′ }, the hypergraphs Γ ′ and Γ ′′ are independent, with Γ ′ ∼ U (d ′ , w ′ ) and Γ ′′ ∼ U (d ′′ , w ′′ ).By symmetry, an analogous splitting property holds in respect of any partition of E. In particular, if v ∈ V and e ∈ E are chosen independently of Γ, then Γ \ Γ(v) and Γ \ Γ(e) are also uniformly distributed given their vertex degrees and edge weights.

Analysis of a core-finding algorithm
Given a hypergraph γ on V × E, set γ 0 = γ and define recursively a sequence of hypergraphs (γ n ) n 0 as follows: for n 0, given γ n , choose if possible, uniformly at random, a vertex v n+1 ∈ V such that d = d γn (v n+1 ) ∈ {1, . . ., k − 1} and set where γ n (v n+1 ) = {(v n+1 , e 1 ), . . ., (v n+1 , e d )}; if there is no such vertex, set γ n+1 = γ n .Thus we remove from γ n all edges containing the chosen vertex v n+1 .The sequence terminates at the k-core γ.Take Γ ∼ U (d, w) and consider the corresponding sequence (Γ n ) n 0 .We continue to write v n for the random vertices chosen in the algorithm.Set D n = d Γn and W n = w Γn .In the sequel we shall use the symbols j, k, l, d, d ′ to denote elements of Z + × {V }, while w will denote an element of Z + × {E}.This is just a formal device which will allow us to refer to two different sets of coordinates by ξ d and ξ w , and, to lighten the notation, we shall identify both these sets with Z + where convenient.For 0 d d ′ and w 0, set ).Note that the process (ξ n ) n 0 is adapted to the filtration (F n ) n 0 given by F n = σ(D r , W r : r = 0, 1, . . ., n).Proposition 7.2.For all n 0, conditional on F n , we have Proof.The claim is true for n = 0 by assumption.Suppose inductively that the claim holds for n.The algorithm terminates on the F n -measurable event so on this event the claim holds also for n + 1. Suppose then that the algorithm does not terminate at n. Conditional on F n , v n+1 and Γ n are independent.Hence, by splitting, Γ n \ Γ n (v n+1 ) is uniform given its vertex degrees and edge weights.Then, by a further splitting, we can delete each of the edges Γ n (e) with (v n+1 , e) ∈ Γ n , still preserving this uniform property, to obtain Γ n+1 .Hence the claim holds for n + 1 and the induction proceeds.
Note that the conditional distribution of v n+1 given F n depends only on D n and that (D n+1 , W n+1 ) is a function of Γ n+1 , and hence is a function of (v n+1 , Γ n ).It follows that (D n , W n ) n 0 is a Markov chain and hence, by symmetry, (ξ n ) n 0 is also a Markov chain.It will be convenient to denote the state-space by S, to define for ξ ∈ S, w(w − 1)ξ w , and to set q(ξ) = m(ξ)n(ξ)/l(ξ).Thus, ξ d is the number of vertices of degree d, and n(ξ) is the number of light vertices, that is, those of degree less than k; l(ξ) is the total degree of the light vertices, and h(ξ) is the total degree of the heavy vertices; m(ξ) is the total weight, and p(ξ) is the number of ordered pairs of elements of ξ having the same edge label.Note that, for all ξ ∈ S, m(ξ) N m and n(ξ) l(ξ), so q(ξ) N m.We obtain a continuous-time Markov chain (X t ) t 0 by taking (ξ n ) n 0 as jump chain and making jumps at rate q(X t ).As we saw in Subsection 5.2, in the study of terminal values, we are free to choose a convenient jump rate, which should, in particular ensure that the terminal time remains tight in the limit of interest.Our present choice will have this property.However, it has been chosen also so that the limiting differential equation has a simple form.Define now coordinate functions x d,d ′ and x w on S, for k d d ′ and w 1, by We consider X as a process in R D , where We shall use h(x), m(x) and p(x) to denote functions of defined as for ξ ∈ S, but replacing ξ d,d ′ and ξ w by x d,d ′ and x w respectively.Note that the jumps of X are bounded in supremum norm by (k − 1)(L − 1)/N .Note also that h(X t ) m(X t ) for all t and that the algorithm terminates at T 0 = inf{t 0 : h(X t ) = m(X t )}.Hence X T0 is the desired vector of core frequencies: Pd Recall that m = m(p) = m(q) is a given constant.We also write m(x) for the function on R D just defined.Thus m = m(X 0 ).Let where x d+1,d = 0 for k d L, and Define, as in Section 4, the drift vector β on S by

Solving the differential equation
Consider the limiting differential equation ẋt = b(x t ) in U 0 , with starting point x 0 = X 0 given by In components, the equation is written There is a unique solution (x t ) t 0 in U 0 and, clearly, x w t = e −tw q w .Then m(x t ) = me −t σ(e −t ) and p(x t ) = me −2t σ ′ (e −t ).Hence, if (τ t ) t 0 is defined by τt = p(x t )/m(x t ), τ 0 = 0, then e −τ = σ(e −t ).A straightforward computation now shows that the remaining components of the solution are given by and that h(x t ) = mφ(e −t )σ(e −t ).Note that (m− h)(x t ) = σ(e −t )(e −t − φ(e −t )), so g * = e −ζ0 , where ζ 0 = inf{t 0 : m(x t ) = h(x t )}.

Proof of Theorem 7.1
Recall that the core frequencies are found at the termination of the core-finding algorithm, see (12).A suitably chosen vector of frequencies evolves under this algorithm as a Markov chain, which we can approximate using the differential equation whose solution we have just obtained.The accuracy of this approximation is good so long as the hypergraph remains large.Consider first the case where g * = 0, when we have m(x t ) > h(x t ) for all t 0.Here the hypergraph may become small as the algorithm approaches termination, so we run close to termination and then use a monotonicity argument.Fix ν ∈ (0, 1], set µ = ν/3 and choose t 0 such that m(x t0 ) = 2µ.Define as in Section 4. Since m(x t ) is decreasing in t, we have x t ∈ U for all t t 0 .Hence there exists ε ∈ (0, ν/(3L)), depending only on p, q and ν such that for all ξ ∈ S and t t 0 , x(ξ) It is straightforward to check, by bounding the first derivative, that b is Lipschitz on U with constant K (L − 1)L 3 m/µ.Set δ = εe −Kt0 /3, as in Section 4. We have X 0 = x 0 , so Ω 0 = Ω.By Proposition 7.3, and using the fact that m(X t ) does not increase, we have so Ω 1 = Ω provided N is large enough that ψ(N µ)t 0 δ.The total jump rate q(ξ) is bounded by Q = N m for all ξ ∈ S. The norm of the largest jump is bounded by J = (k − 1)(L − 1)/N .Take A = QJ 2 e = (k − 1) 2 (L − 1) 2 me/N and note that δJ/(At 0 ) δ/((k − 1)(L − 1)met 0 ) 1, so A QJ 2 exp{δJ/(At 0 )}, and so Ω 2 = Ω as in Subsection 4.2, Footnote 4. On the event {sup t t0 X t − x t ε}, we have T 0 > t 0 , so Hence, by Theorem 4.2, we obtain for a constant C ∈ [1, ∞) depending only on p, q and ν, which is the conclusion of the theorem in the case g * = 0. We turn to the case where g * > 0 and g * = sup{g ∈ [0, 1) : φ(g) > g}.Set now µ = 1 2 m(x ζ0 ) and choose t 0 > ζ 0 .Define U , ζ and T as in the preceding paragraph, noting that ζ = ζ 0 .We seek to apply the refinement of Theorem 4.3 described in Footnote 6, and refer to Subsection 4.3 for the definition of ρ.By the crossing condition, φ(g) > g immediately below g * , so (m − h)(x t ) = σ(e −t )(e −t − φ(e −t )) < 0 immediately after ζ = − log g * .We have for a constant C < ∞ depending only on L. So, given ν > 0, we can choose ε > 0, depending only on p, q and ν, such that ε + ρ(ε) ν and C(ε ) and hence that T = T 0 .Define δ and A as in the preceding paragraph.Then, by a similar argument, provided N is sufficiently large that ψ(N µ)t 0 δ, we have Ω 0 = Ω 1 = Ω 2 = Ω.Hence, by Theorem 4.3, for a constant C ∈ [1, ∞) depending only on p, q and ν, as required.

Appendix: Identification of martingales for a Markov chain
We discuss in this appendix the identification of martingales associated with a continuous-time Markov chain X = (X t ) t 0 with finite jump rates.In keeping with the rest of the paper, we assume that X has a countable state-space, here denoted E, and write Q = (q(x, y) : x, y ∈ E) for the associated generator matrix.An extension to the case of a general measurable state-space is possible and requires only cosmetic changes.A convenient and elementary construction of such a process X may be given in terms of its jump chain (Y n ) n 0 and holding times (S n ) n 1 .We shall deduce, directly from this construction, a method to identify the martingales associated with X, which proceeds by expressing them in terms of a certain integer-valued random measure µ.There is a close analogy between this method and the common use of Itô's formula in the case of diffusion processes.The method is well known to specialists but we believe there is value in this direct derivation from the elementary construction.Our arguments in this section involve more measure theory than the rest of the paper; we do not however need the theory of Markov semigroups.

The jump-chain and holding-time construction
The jump chain is a sequence (Y n ) n 0 of random variables in E, and the holding times (S n ) n 1 are non-negative random variables which may sometimes take the value ∞.We specify the distributions of these random variables in terms of the jump matrix Π = (π(x, y) : x, y ∈ E) and the jump rates (q(x) : x ∈ E), given by π(x, y) = q(x, y)/q(x), y = x and q(x) = 0, 0, y = x and q(x) = 0, π(x, x) = 0, q(x) = 0, 1, q(x) = 0, and q(x) = −q(x, x) = y =x q(x, y).
Take Y = (Y n ) n 0 to be a discrete-time Markov chain with transition matrix Π.Thus, for all n 0, and all x 0 , x 1 , . . ., x n ∈ E, where λ(x) = P(Y 0 = x).Take (T n ) n 1 to be a sequence of independent exponential random variable of parameter 1, independent of Y .Set and construct X by where ∂ is some cemetery state, which we adjoin to E. We are now using ζ for the explosion time of the Markov chain X, at variance with the rest of the paper.For t 0, define These are, respectively, the time and destination of the first jump of X starting from time t.Consider the natural filtration (F X t ) t 0 , given by F X t = σ(X s : s t).Write E for the set of subsets of E and set q(∂) = 0 and π(x, B) = y∈B π(x, y) for B ∈ E. Proposition 8.1.For all s, t 0 and all B ∈ E, we have, almost surely, )e −q(Xt)s .Before proving the proposition, we need a lemma, which expresses in precise terms that, if X has made exactly n jumps by time t, then all we know at that time are the states Y 0 , . . ., Y n , the times J 1 , . . ., J n and the fact that the next jump happens later.Lemma 8.2.Define G n = σ(Y m , J m : m n).For all A ∈ F X t and all n 0, there exists Ãn ∈ G n such that Proof.Denote by A t the set of all sets A ∈ F X t for which the desired property holds.Then A t is a σ-algebra.For any s t, we can write Proof of Proposition 8.1.The argument relies on the memoryless property of the exponential distribution, in the following conditional form: for s, t 0 and n 0, almost surely, on {J n t}, Then for B ∈ E and A ∈ F X t , we have On summing all the above equations we obtain as required.

Markov chains in a given filtration
For many purposes, the construction of a process X which we have just given serves as a good definition of a continuous-time Markov chain with generator Q.However, from now on, we adopt a more general definition, which has the merit of expressing a proper relationship between X and a general given filtration (F t ) t 0 .Assume that X is constant on the right, that is to say, for all t 0, there exists ε > 0 such that X s = X t whenever t s < t + ε.Set J 0 = 0 and define for n 0, For t 0, define J 1 (t) and Y 1 (t) as above.Assume that X is minimal, so that X t = ∂ for all t ζ, where ζ = sup n J n .Assume finally that X is adapted to (F t ) t 0 .Then we say that X is a continuous-time (F t ) t 0 -Markov chain with generator Q if, for all s, t 0, and all B ∈ E, we have, almost surely, The process constructed above from jump chain and holding times is constant on the right and minimal and we do recover the jump chain and holding times using (13); moreover by Proposition 8.1, such a process is then a continuous-time Markov chain in its natural filtration.The defining property of a continuoustime Markov chain extends to stopping times.
Proposition 8.3.Let X be an (F ) t 0 -Markov chain with generator Q and let T be a stopping time.Then, for all s 0 and B ∈ E, on {T < ∞}, almost surely, P(J 1 (T ) > T + s, Y 1 (T ) ∈ B|F T ) = π(X T , B)e −q(XT )s .
Proof.Consider the stopping times and, summing over k, Letting m → ∞, we can replace, by bounded convergence, T m by T , thus proving the proposition.

The jump measure and its compensator
The jump measure µ of X and its compensator ν are random measures on (0, ∞) × E, given by Recall that the previsible σ-algebra P on Ω × (0, ∞) is the σ-algebra generated by all left-continuous adapted processes.Extend this notion in calling a function defined on Ω × (0, ∞) × E previsible if it is P ⊗ E-measurable.
Theorem 8.4.Let H be previsible and assume that, for all t 0, Then the following process is a well-defined martingale We shall show that μ = ν.Once this is done, the proof of Theorem 8.4 will be straightforward.For n 0, define measures μn and νn on P ⊗ E by μn νn , so it will suffice to show the following lemma.Lemma 8.5.For all n 0, we have μn = νn .
Proof.The proof rests on the following basic identity for an exponential random variable V of parameter q: Let T be a stopping time and let S be a non-negative F T -measurable random variable.Set U = (T +S)∧J 1 (T ).By Proposition 8.3, we know that, conditional on F T , J 1 (T ) and Y 1 (T ) are independent, J 1 (T )−T has exponential distribution of parameter q(X T ) and Y 1 (T ) has distribution π(X T , .).From the basic identity, we obtain The set of such sets D forms a π-system, which generates the σ-algebra P ⊗ E. We shall show that μn (D) = νn (D) 1.By taking A = Ω, B = E, t = 0 and letting u → ∞, this shows also that μn and νn have the same finite total mass.The lemma will then follow by uniqueness of extension.
Take T = J n ∧ t ∨ J n+1 and S = (u − T ) + 1 {T <Jn+1} .Then H(r, y)ν(dr, dy) < ∞ for all t 0, then M t is well-defined and integrable and, now with general s t and A ∈ F s , E((M t − M s )1 A ) = 0.
The result extends to general previsible functions H by taking differences.

Martingale estimates
Theorem 8.4 makes it possible to identify martingales associated with X in a manner analogous to Itô's formula.We illustrate this by deriving three martingales M, N and Z associated with a given function f : E → R. The processes M and N depend, respectively, linearly and quadratically on f , whereas Z is an exponential martingale.We use N and Z to obtain quadratic and exponential martingale inequalities for M , which are used in the main part of the paper.We emphasise that f can be any function.In the main part of the paper, we work with the martingales associated with several choices of f at once.In this subsection we do not burden the notation by registering further the dependence of everything on f .The discussion that follows has a computational aspect and an analytic aspect.The reader may wish to check the basic computations before considering in detail the analytic part.We note for orientation that, in the simple case where the maximum jump rate is bounded and where f also is bounded, then there is no explosion and M, N and Z, as defined below, are all martingales, without any need for reduction by stopping times.For simplicity, we make an assumption in this subsection that X does not explode.A reduction to this case is always possible by an adapted random time-change -this can allow the identification of martingales in the explosive case by applying the results given below and then inverting the time-change.We omit further details.For all t ∈ [0, ∞), we have J n t < J n+1 for some n 0. Then Define, as usual, for stopping times T , the stopped process M T t = M T ∧t .Proposition 8.6.For all stopping times T ζ The first sentence of the statement now follows easily from Theorem 8.4.For the second, it suffices to note that, for the stopping times T n = inf{t 0 : τ (X t ) > n} ∧ n, we have T n ↑ ζ 1 as n → ∞ and Tn 0 τ (X t )dt n 2 , for all n, almost surely.
We turn now to L 2 estimates, in the process identifying the martingale decomposition of M 2 .Note first the following identity: for t ∈ [0, ∞) with t ζ 1 , This may be established by verifying that the jumps of left and right hand sides agree, and that their derivatives agree between jump times.Define α(x) = x ′ =x {f (x ′ ) − f (x)} 2 q(x, x ′ ), and set ζ 2 = inf{t 0 : α(X t ) = ∞}.By Cauchy-Schwarz, we have τ (x) 2 α(x)q(x) for all x, so ζ  This identity may be verified in the same way as (14).Consider for n 0 the stopping time U n = inf{t 0 : φ(X t ) + τ (X t ) > n} and note that U n ↑ ζ * as n → ∞.
Fig 1.The unit square (0, 1) 2 is a possible choice of the set U .The inner and outer solid curves bound a tube around the deterministic solution (red) of the ordinary differential equation, which starts inside U .The realization shown of the Markov chain trajectory does not leave the tube before exit from U .This is a realization of the stochastic epidemic, which will be discussed in more detail in Section 5.

Fig 2 .
Fig 2. The graphic shows the proportions of susceptible and infective individuals in a population of 1000, of which initially 900 are susceptible and 100 are infective.The parameter values are λ = 5 and µ = 1.One realization of the Markov chain, and the solution of the differential equation, are shown at 1 : 1000 scale.

Figure 3
gives two pictorial representations of a small hypergraph.The degree and weight functions d γ : V → Z + and w γ : E → Z + of γ are given by d γ (v) = |γ(v)| and w γ (e) = |γ(e)|.The k-core γ of γ is the largest subset γ of γ such that, for all v ∈ V and e ∈ E,

Fig 3 .
Fig 3.The left picture shows a hypergraph with eight vertices, three 2-edges, and three 3edges.An incidence is selected at random, shown by the enlarged vertex, and chosen as root of a branching process, shown as the bottom vertex on the right.The root has two hyperedge offspring, shown as grey squares.One of these has two vertex offspring, and so on.

Proposition 8 . 8 .
For all stopping times T ζ 1 ,E exp M T − T 0 φ(X t )dt 1,and, for allA, B ∈ [0, ∞), t )dt A e A−B .Moreover, Z ζ * is a local martingale and a supermartingale, and Z Un is a martingale for all n.