General epidemiological models: Law of large numbers and contact tracing

We study a class of individual-based, fixed-population size epidemic models under general assumptions, e.g., heterogeneous contact rates encapsulating changes in behavior and/or enforcement of control measures. We show that the large-population dynamics are deterministic and relate to the Kermack-McKendrick PDE. Our assumptions are minimalistic in the sense that the only important requirement is that the basic reproduction number of the epidemic $R_0$ be finite, and allow us to tackle both Markovian and non-Markovian dynamics. The novelty of our approach is to study the"infection graph"of the population. We show local convergence of this random graph to a Poisson (Galton-Watson) marked tree, recovering Markovian backward-in-time dynamics in the limit as we trace back the transmission chain leading to a focal infection. This effectively models the process of contact tracing in a large population. It is expressed in terms of the Doob $h$-transform of a certain renewal process encoding the time of infection along the chain. Our results provide a mathematical formulation relating a fundamental epidemiological quantity, the generation time distribution, to the successive time of infections along this transmission chain.


General individual-based epidemic model
In the present article, we study an extension of the general epidemiological framework introduced in [18] to model the COVID-19 epidemic. Let us briefly recall the main features of this model.
At time t = 0, we consider a population made of susceptible individuals, that have never encountered the disease, and infected individuals. Each infected individual is supposed to belong to one compartment, that models the stage of the disease of this individual. Classical examples of compartments are the exposed compartment (E), where the individual is infected but not yet infectious, the infectious compartment (I), and the recovered compartment (R), once the individual has become immunized. In the case of the COVID-19 epidemic, it might be relevant to add a hospitalized (H) and an intensive care unit (U ) compartment, as monitoring the number of individuals in these states is typically important for policy-making. See Figure 1 for an example of compartmental model used for the COVID-19 epidemic. We denote by S the set of all compartments. For the sake of simplicity, we will also assume that S is finite.
We encode the compartment to which individual x belongs as a stochastic process (X x (a); a ≥ 0) valued in S, that we call the life-cycle process. The random variable X x (a) gives the compartment to which x belongs at age of infection a, that is, a unit of time after its infection. Moreover, individual x is endowed with a point measure P x on R + that we call the infection point process, which is assumed to be simple. The atoms of P x encode the age at which x makes infectious contacts with the rest of the population. We think of the pair (P x , X x ) as describing the course of the infection of individual x. We make the fundamental assumption that the pairs (P x , X x ) are i.i.d. for distinct individuals in the population.
In [18] we assumed that susceptibles are in excess, and that any infectious contact leads to a new infection. The resulting population is then distributed as a Crump-Mode-Jagers (CMJ) process. In the current work, we consider an extension of this model that takes into account the saturation due to the finite pool of susceptibles in the population. More precisely, we consider a population of finite fixed size N . Each infectious contact is made with an individual uniformly chosen in this population, and it results in a new infection only if the targeted individual is susceptible. Finally, we model the impact of control measures, such as school closure, or national lockdown, with a contact rate function (c(t); t ≥ 0). This contact rate is such that an infection occuring at time t is only effective with probability c(t) ∈ [0, 1]. With probability 1 − c(t), the infection is removed. A formal description of this model is provided in Section 2.

Law of large numbers for the age structure
A standard way to study compartmental models is to consider the dynamics of the number of individuals in each compartment. If the underlying probabilistic model is Markovian, this typically gives rise to systems of ODEs of the SIR type in the large population size limit, see [10] for a recent account. Here, we will not keep track of the count of individuals in the various compartments, but we will rather be interested in the age structure of the population. Our main result is a law of large number for the age structure of population, which is the equivalent of Theorem 7 of [18] for our non-linear extension of the model.
We anticipate the notation of Section 2 and denote the empirical measure of ages and compartments in the population at time t as where σ N x is the infection time of individual x, and the sum runs over all infected individuals at time t. (Note that t−σ N x is just the age of x at time t.) The measure µ N t encodes the ages and compartments of infected individuals at time t. Let us also denote by ∀t ≥ 0, ∀i ∈ S, Y N t (i) = #{individuals in i at time t} = µ N t [0, ∞), {i} .
The limiting distribution of µ N t will depend on the following two quantities: • The intensity measure of the infection point process defined as τ (da) := E P(da) .
We assume that τ has a density w.r.t. the Lebesgue measure that we denote by τ (a) with a slight abuse of notation, and that R 0 := τ ([0, ∞)) < ∞. We also assume that there exists α ∈ R, called the Malthusian parameter such that ∞ 0 e −αa τ (a)da = 1.
• The one-dimensional marginals of the life-cycle process, denoted by ∀i ∈ S, ∀a ≥ 0, p(a, i) := P X(a) = i .
Let I 0 ∈ (0, 1). At time t = 0, we assume that every individual is independently infected with probability I 0 and that its age of infection is chosen independently according to a probability density g on R + . In the following, we define I N 0 ⊆ [N ] as the set of infected individuals at t = 0. We will also use the notation S 0 = 1−I 0 .
We make the simplifying assumption that the underlying compartmental model is acyclic: we assume that for any two compartments i, j ∈ S, if j can be accessed from i with positive probability, that is, if the event that we can find s ≤ t such that X(s) = i and X(t) = j has positive probability, then i cannot be accessed from j. In other words, the directed graph on S composed of all edges i → j such that j is accessible from i is a directed acyclic graph. This assumption is not very restrictive, most natural compartmental models enjoy this acyclic property. See Figure 1. It is only needed to reinforce the finite-dimensional convergence to a Skorohod one in the following theorem.
This theorem is proved at the end of Section 7, as a consequence of the convergence of the so-called historical process (Theorem 28). From Theorem 1 we immediately deduce the following result.

Corollary 2.
For any i ∈ S, we have n(t, a)p(a, i) da; t ≥ 0 (4) in probability in the Skorohod topology.
Remark 3. Theorem 1 can be easily extended to weaker assumptions on the initial condition. For instance, it is not hard to see from our proof that Theorem 1 holds true if we simply assume that where the convergence holds in probability for the weak topology.
Remark (Infinite reproduction number). The dynamics of the epidemic until some fixed time t does not depend on the potential infections occurring after time t.
In particular we can remove all atoms after age t of the point processes of the individuals that are initially susceptible without affecting the process at time t. Similarly, we can remove all atoms after time t of the point processes of the initially infected individuals. Anticipating the notationτ in (8) for the intensity measure of the initially infected individuals, the convergence of the process in Theorem 1 until time t only requires that t 0 τ (a)da < ∞ and t 0τ (a)da < ∞. Thus, Theorem 1 holds under the weaker assumption that τ is a locally finite measure, and that g decays fast enough so thatτ is also locally finite. In particular one could have that This result will follow from the more general Theorem 28, and is proved in Section 7. The definition of a solution to Equation (3) is provided in Section 3. Theorem 1 proves that the age structure of the population converges to a limiting non-linear PDE of the Kermack-McKendrick type [30]. It also entails that the number of individuals in each compartment can be recovered by integrating the one-dimensional marginals p(a, i) against the age structure.
It is interesting to note that the limit of our model is universal. The limiting expression in Equation (4) does not depend on the entire distribution of the pair (P, (X(a); a ≥ 0)), but only on: • the mean number of secondary infections τ (a) induced by an infected individual with age a; • the one-dimensional marginals p(a, i) of the life-cycle process.
These are the only individual characteristics that need to be assessed to forecast the dynamics of the epidemic at a large scale. By further writing τ = R 0 ν with we see that τ only depends on two well-known epidemiological quantities: • the basic reproduction number, R 0 , which the mean number of secondary infections induced by a single individual in an entirely susceptible population; • the distribution of the generation time, ν, which gives the typical time between the infection of a source individual and that of the recipient individual in an infection pair [19].
Further interesting modeling consequences of Theorem 1 are discussed in our earlier work [18].

Contact tracing: the historical process
We already argued that our approach allows to identify the individual characteristics that impact the large population size dynamics. We identified those parameters as R 0 , the distribution of generation time ν together with the one-dimensional marginals of the life-cycle process. The estimation of those parameters is obviously of paramount importance. One possible approach to estimate the generation time distribution consists in observing the generation times backwards in time using contact tracing [15,20], i.e., the time between the infection time of an individual (the infectee) and that of his/her infector (rather than the infection time of the individuals he/she infects). In [11], the authors addressed this specific question in a simplified setting. More specifically, they assumed that c ≡ 1 and that the susceptibles are in excess so that our microscopic model can be approximated by a Crump-Mode-Jagers process as in our earlier work [18]. They showed that the observation of backward generation times raises two serious issues: (i) First, observations of past infections induce a strong observational bias: the backward generation time distribution differs from the actual generation time distribution. In the supercritical case (i.e., when R 0 > 1), the backward generation time has density where α > 0 is the Malthusian parameter of the model defined in (2). As a consequence, observations of backward infection times tend to be biased towards lower values.
(ii) Infection times are difficult to observe. Instead, the onset of symptoms is generally observed. For this reason, the serial interval, which is defined as the time between symptom onsets in the two individuals mentioned above, is often used as a surrogate for the generation time. As discussed in [11], this can induce a second source of significant observational bias.
As already mentioned above, the authors in [11] address the previous bias in the case where c ≡ 1 and when susceptibles are in excess. In the present article, we will show if we (1) take into account saturation effect (i.e, when the population is out of the branching process regime), and (2) assume some heterogeneity in the contact rate, then those two components of the dynamics can induce a third source of bias. In order to provide some intuition of the upcoming results, consider a newly infected individual at time t. Trace backward in time the chain of infection up to time t = 0. (The first individual along the chain is the infector of the focal individual, the second is the infector of the primary infector and so on.) Finally, along the chain, report the successive times of infection, see Figure 2. When susceptibles are in excess (branching approximation) Jagers and Nerman [32] showed under mild assumptions that as t → ∞, the successive time of infections are well approximated by the values of a renewal process where the ξ i 's are i.i.d. and distributed according to (6). In the presence of saturation, we show that the chain of infection is given by an h-transform of the renewal process R (t) . Intuitively, the h-transform tends to favor infection at times where there is a large fraction of susceptibles and a high contact rate. When the initial age structure of the population coincides with the "equilibrium" measure of the branching approximation, i.e., the h-transformed process can be reformulated in a simple manner. In Proposition 24 we show that it is identical in law to the original renewal process conditioned on survival assuming that at each step k the process is killed with probability 1 − c(R (t) (k))S(R (t) (k)).
In Section 7, we introduce the historical process. Loosely speaking, the historical process is the empirical measure reporting the chain of infections for every individual infected at time t = 0 or who eventually gets infected in the future. It is constructed by reporting the successive age of infections along the chain, but also the stages of the life-cycle process for every "ancestor" along the chain, e.g. onset of the symptoms, latency period, etc. In Theorem 28, we show that the historical process converges to a deterministic probability law. Loosely speaking, our result shows that the chain of infections for a finite sample of infected individuals are asymptotically independent. Furthermore, for each sampled individual, its chain of infection is distributed in such a way that • the successive times of infection are expressed in terms of the h-transformed renewal process mentioned above.
• the life cycle of individuals along the chain is biased, and the bias can be expressed as a Palm modification of the original life cycle. This will be made more formal in Section 5.
Going back to the two epidemiological questions (i) and (ii), our results decipher how the epidemiological parameters R 0 and the generation time distribution ν relate (in a non-trivial way) to observables which can be directly collected from contact-tracing data.

A genealogical dual to the delay equation
The Kermack-McKendrick equation (3) can be reformulated in terms of a nonlinear delay equation. To ease the exposition, let us consider the case c ≡ 1. In Section 5, the general case c ≡ 1 will be exposed. If (n(t, a); t, a ≥ 0) denotes the solution to Equation (3) with c ≡ 1, let us define the number of infections between time 0 and t as Then we will derive in Section 3 that B solves the following non-linear delay equation: where we recall that S 0 = 1 − I 0 is the initial fraction of susceptibles.
Our proof of Theorem 1 uses a genealogical approach, where we look backwards in time at the set of potential infectors of a focal individual. This approach leads to a genealogical dual to the delay equation that we think to be of independent interest. The dual is built out of the following branching process.
Recall that R 0 stands for the total mass of τ and ν = τ /R 0 . We define the intensityτ so that the measureτ (u) du is the intensity measure of the infection point process of an individual with initial age distributed as g. Let us further set The branching process is constructed as follows. Let us assume that individuals in the branching process are either infected (I) or susceptibles (S). Suppose that the population starts from a single (S) individual. Then, at each generation, an (S) individual produces: • a Poisson S 0 R 0 distributed number of (S) individuals; • a Poisson (1 − S 0 )R 0 distributed number of (I) individuals.
Individuals of type (I) have no offspring. Draw an oriented edge from each individual towards its parent. Assign a length independently to each edge, such that the length of an edge originating from an (S) individual is distributed as ν, and that of an edge coming from a (I) individual is distributed asν. The previous branching process corresponds to the large population size limit of the set of potential infectors of a fixed individual. Type (I) individuals correspond to individuals that were initially infected. Each edge corresponds to an infectious contact in the population, and the length of that edge is the age of the infector when this contact occurs.
The corresponding object is a rooted geometric tree, where edges are endowed with a length. We define the infection path at the root as the (a.s. unique) geodesic connecting the set of infectious individuals (I) to the root. Finally, the time of infection σ ∞ is defined as the length of the geodesic. The following result connects the distribution of σ ∞ to the delay equation.
In Section 5, we will derive a similar dual for the delay equation with c ≡ 1see Proposition 19.

Link with literature
Age-structured models in epidemiology. The idea of considering an infection through its age structure dates back to at least the pioneering work of Kermack and McKendrick [30]. They introduced a general epidemic model where the infectiousness of an individual can depend on its age of infection, which was formulated as the solution to a delay equation equivalent to (3). In the same article [30], the authors noticed that if the infectiousness and the recovery rate are assumed to be constant, Equation (3) reduces to a set of non-linear ODEs now known as the SIR model. Even if the work of [30] was primarily devoted to the more general agestructured model, subsequent work on epidemic modeling has mostly focused on extensions of the ODE special case. Nonetheless, the original age-structured model is now receiving renewed attention both in the mathematical literature [13,23,37] and in applications [8,9,12,46].
In the probabilistic literature, it is only quite recently that it was proved in [5] that Equation (3) describes the large population size limit of a stochastic epidemic model similar to ours. The setting of the main result in [5] is slightly different from that considered in the current work: the process is assumed to be supercritical (R 0 > 1) and to start from a single infected individual. After an appropriate time-shift so as to skip the long initial branching phase when there are few infected individuals, [5] prove that the fraction of susceptibles in the population converges to a limiting function (S(t); t ∈ R). This limit corresponds to the number of susceptible individuals in an extension of (3) to the whole real line [13,40,41]. Although the law of large numbers considered in [5] is quite similar to our Theorem 1, let us outline some important differences.
From a purely technical point of view, [5] work under quite restrictive assumptions on the point process P, see Assumption 2 on the top of page 7. For instance the simple Markovian SEIR model [10] does not fulfill these assumptions. Also, the model in [5] does not explicitly account for compartments through (X(a); a ≥ 0), nor for the contact rate (c(t); t ≥ 0). These are two key modeling ingredients in the context of the COVID-19 epidemic. While incorporating compartments would be a direct extension of the proofs in [5], taking into account the contact rate would raise the same serious technical difficulties as in our work, since their proofs rely on a backward-in-time approach similar to ours. Finally, the description of the chain of transmission events leading to a focal infection, which is one of the main contributions of our work, is not considered in [5].
Other age-structured models. There exists a rich literature on age-dependent population processes, not necessarily related to epidemic models. Let us first mention the Crump-Mode-Jagers processes (CMJ), also known as general branching processes [25,39], from which the formalism of our model is borrowed. In these processes, the birth time of children can depend in a very general way on the age of the parent, but individuals reproduce independently of each other. These processes are good approximations of the early dynamics of an epidemic when susceptible individuals are in excess. We have considered such an approximation in an earlier work [18] and proved a law of large numbers similar to Theorem 1 in this context.
Further models have relaxed the assumption that individuals reproduce independently by allowing the birth and death rates to depend on the whole age distribution of the population [22,26,27,43]. Under a large population size limit, the age structure of these models converges to an extension of the McKendrickvon Foerster equation that generalizes (3) [22,43]. Moreover, several central limit theorems have been derived for this age structure [14,42]. Although these models allow for a very general dependence between births and the state of the population, they require the age distribution to be a Markov process and our results are not trivially implied by those in the above works. The techniques used to study these models are also quite different from the backward-in-time approach developed here. They require to see the age structure as the solution to a stochastic equation driven by a Poisson measure, or to use martingale techniques which could not be extended to our setting.
Other non-Markovian epidemic models. Finally, there is a recent series of work that considers epidemic models that are non-Markovian [16,[33][34][35][36], but not structured by the age of infection. They derived laws of large numbers and central limit theorems for extensions of the model considered in [34] that can incorporate spatial heterogeneity [36], varying or random infectiosity [16,33,35], and applied these models to the COVID-19 epidemic in France [17]. The limiting equations that describe the dynamics of the density of individuals in each compartments are systems of so-called Volterra integral equations. These equations are tightly linked to our PDE representation using the Kermack-McKendrick equation (3), as is acknowledged explicitly in [35], Proposition 2.1. All the models with nonvanishing immunity that they consider (SIR, SEIR) can be formulated within our framework. The infection point process P is obtained by starting either a homogeneous Poisson point process [34], an inhomogeneous Poisson point process [16,33], or an inhomogeneous Poisson point process with random intensity [35], at age a = 0 in the SIR case, or after an exposed phase in the SEIR case. Moreover, the proof techniques they use rely heavily on a representation of their model as the solution to a stochastic equation driven by a Poisson measure, which does not hold in our more general setting. Nevertheless, let us acknowledge that their techniques allow to derive central limit theorems for their models.

Further discussion
Let us discuss some practical implications and limitations of our results for epidemiological applications, as well as some avenues for future work.

Impact of the initial condition.
A major limitation to the practical interest of Theorem 1 is that the age structure of the initial population should converge to a known limit, for which a positive fraction of individuals are infected. This means that our result could only be applied once the epidemic has been spreading for a long enough time, and that the initial age structure of the population needs to be prescribed. In practice this age structure can hardly be estimated.
In applications, the large number of individuals observed at t = 0 results from the growth of the epidemic out of a few initial individuals. It is thus natural to try to derive a law of large numbers similar to Theorem 1 but started from a few infected individuals. Such a result was already derived in [5], for c ≡ 1 and under some additional technical assumptions. It was shown that the limit of the age structure then converges to an extension of (3) to the whole real line t ∈ R. This extension is unique up to a shift [13], and does not depend on any initial age structure, solving the above issue. We expect that a similar result holds in our setting with c ≡ 1. However, in this case, the solutions to (3) on the real line are neither unique nor shift-invariant. It is a more delicate issue to describe these solutions, and to understand which one of them is selected by the initial randomness of the stochastic process. This relates to existing work on dynamical systems perturbed by a small noise and started near an unstable equilibrium [4,6].
Speed of convergence and deviations. Theorem 1 provides a rigorous justification to the use of deterministic age-structured epidemic models, as limits of a large class of stochastic individual-based models. For the purpose of applications, it would be desirable to understand quantitatively how accurate this approximation is, that is, to derive a speed of convergence of the stochastic model to its deterministic limit. An even more important question for statistical applications would be to characterize the deviation of the stochastic system from the limit. We have derived our law of large numbers under a minimal first moment assumption (1) on the infection point process. We expect that a central limit theorem similar to those obtained in [14,34,42] should hold under a second moment assumption on P. This would entail that the fraction of individuals in the various compartments remains at a distance of order 1/ √ N from the deterministic limit, and would provide a natural limiting expression for the likelihood of the process. Note that, since we do not assume that P is a Poisson or a Cox process, the correlation structure of the limiting Gaussian process should be different from than in [14,34,42] and more similar to the co-variance structure of a Crump-Mode-Jagers process, described for instance in [28].

Outline
The rest of this paper is organized as follows. A formal description of the model is provided in Section 2, and the Kermack-McKendrick PDE is studied in Section 3.
Section 4 contains the graph construction of our model, as well as a rigorous definition of the ancestral process mentioned in Section 1.3. Our proofs rely on showing the local weak convergence of the graph of potential infectors to a limiting Poisson tree. Section 5 describes this limiting tree, and provides a characterization of the transmission chain leading to the infection of individual in terms of the htransform of a renewal process. Finally, the convergence to the Poisson tree is carried out in Section 6 and our law of large numbers is proved in Section 7.

Description of the model
In the following, we will consider an epidemic model in which individuals' life trajectories are represented by independent stochastic processes. We distinguish between two types of individuals: • Susceptible individuals that have never been infected before.
• Infected individuals that have been infected in the past. We emphasize that the meaning of infected is a bit broader than usual. For instance, a recovered or dead individual is considered as infected. To each infected individual, we associate an age. The age is the time elapsed since the beginning of the infection.
There are N individuals in the population. To each individual x ∈ [N ], we associate a pair of processes (P x , X x ) describing respectively the process of secondary infections and the successive stages of the disease experienced by the focal individual x. More precisely: • The life-cycle process, denoted by (X x (a); a ≥ 0), is a random process valued in S where X x (a) is the stage of the disease (e.g., exposed, death, etc.) of x at age a.
• The infection point process P x is a point measure describing the ages of potential infections.
Let us denote by X x := (P x , X x ). We will always assume that (X x ; x ∈ [N ]) are i.i.d. and denote by X = (P, X) their common distribution. The state space of X is denoted by X .

Remark 5. Note that we allow for non-trivial dependence between the life-cycle and the infection process. Examples of such dependence can be that a deceased individual is not infectious anymore, a hospitalized individual may have a reduced potential of infection due to quarantine, etc.
We suppose that at t = 0, every individual is independently infected with probability I 0 . Let I N 0 be the set of initially infected individuals. For each x ∈ I N 0 we need to prescribe an age, or equivalently, an infection time. We assume that, conditional on I N 0 , the ages of the initial individuals (Z The epidemic now spreads as follows. Suppose that, at some time t 0 , we have defined a set I N t 0 ⊆ [N ] of infected individuals at time t 0 , and a vector (σ N x ; x ∈ I N t 0 ) of infection times. Let t 1 be the first atom after t 0 of the point measure If there is no such atom, the infection stops. Otherwise, let U be uniformly chosen in [N ], independent of the rest: it is the first individual that comes in contact with any of the infected individuals after time t 0 . If U ∈ I N t 0 , then nothing happens, and we carry out the same procedure for the next atom t 2 . If U / ∈ I N t 0 , then, with probability 1 − c(t 1 ), the infection is ineffective in which case nothing happens and we consider the next infection time t 2 . Otherwise, set I N t 1 = I N t 0 ∪{U } and σ N U = t 1 , and continue the procedure as if starting from time t 1 with the initial infected set I N t 1 . This inductive procedure will be reformulated in terms of an infection graph in Section 4.1.

Kermack-McKendrick PDE
In this section we provide our definition of the solution to the Kermack-McKendrick equation (3). We start with a formal resolution of the PDE using the method of characteristics.
Let I 0 be the initial density of infected individuals and g the initial age profile of the population. First, note that if n is solution of the PDE, then for every pair (t, a) of non-negative numbers, with is the number of new infections at time t. Moreover, As a result, since S(0) = S 0 , we have whereτ (s) was defined in (8), so necessarily solves the nonlinear delay equation 2. the function B(t) := t 0 b(s) ds solves the delay equation (11). If a nondecreasing function B satisfies (11), then we have the following inequality: The previous inequality readily entails that B is absolutely continuous, and thus that we can find b such that B(t) = t 0 b(s) ds. Therefore, existence and uniqueness of solutions to (3) reduce to existence and uniqueness of nondecreasing solutions to (11), which is provided by the following result.
Proof. Let us denote by E the set of all nondecreasing, nonnegative, càdlàg functions on [0, ∞). Recall the definition of the Malthusian parameter α from (2). Fix some for γ > α ∨ 0, so that we have Define We endow E γ with the metric which makes (E γ , d γ ) a complete metric space. As any solution to (11) is bounded and continuous, it is sufficient to show existence and uniqueness of the solution in E γ . We introduce the operator Φ : E γ → E γ such that where f (da) denotes the Stieltjes measure associated to f . Note that Φf ∈ E γ , since it is clear that Φf is bounded, continuous, nonnegative and nondecreasing. Let us show that Φ is a contraction. We have, for By (13) we know that ∞ 0 e −γt τ (t) dt < 1, showing that Φ is a contraction. The Banach fixed point theorem therefore shows that there exists a unique B ∈ E γ such that ΦB = B, ending the proof.

Graph of infection 4.1 Infection graph
Recall the infection model defined in Section 2, and the notation (P and that we have defined Intuitively, Z x encodes the age of infection of individual x at time t = 0. Susceptible individuals have age 0, whereas the age of an infected individual is chosen according to the density g. Define the shifted infection measure Note that if x is susceptible (i.e., Z x = 0), we have P x = P x . Vertices with Z x = 0 will be said of type susceptible (S). Vertices with Z x > 0 will be said of type infected (I).
Recall that each atom of a point process P x encodes a potential infectious contact, which is targeted to a uniformly chosen individual in the population. We enrich the infection point processes by adding the information about the label of the target individual. Formally, we consider a sequence of i.i.d. random variables where a 1 < a 2 < · · · in the sum are the atoms of P x listed in increasing order. We now build a graph out of the family ( N x ; x ∈ [N ]) that records the potential infections in the population. and where the second union is a disjoint union, meaning that for each pair (i, j) we allow for multiple edges from i to j in the infection graph. The marks and edge lengths are defined as follows.
1. Each edge e = (i e , j e ) corresponds to an atom (a e , j e ) of the point process N ie defined in (16). The age a e will be referred to as the length of edge e.

For each vertex
x ∈ V N , we define the mark at x as where the initial infection age Z x is defined in (14).

Remark 9.
As stated in the theorem, G N is an oriented geometric marked graph. By geometric, we mean that edges have lengths. The orientation is dictated by the direction of potential infections, and the meaning of an edge (i, j) is that individual j is potentially infected by i. Finally, the first coordinate of the marking allows to distinguish between infected and susceptible individuals at time t = 0.
A path in G N is a set of edges π = (e 1 , . . . , e n ) such that, j e k = i e k+1 , with the notation (i e , j e ) for the origin and target vertices of the edge e. The length of a path |π| is defined as The genealogical (or topological) distance is defined as the number of edges composing the path (n in our specific example). For k ≤ n, we define the k-truncation of π as τ k π := (e 1 , . . . , e k ).
We say that π is a path from i to j if i e 1 = i and j en = j. A path in G N from i to j corresponds to a potential infection chain between i and j. The length of the path is the time elapsed between the infection of i and j. The genealogical distance is the number of infectors along the chain. It turns out the infection graph that we have constructed corresponds to a directed version of a configuration model. The (undirected) configuration model is a well-studied random graph model where, starting from a prescribed number of half-edges for each of the N vertices (D 1 , . . . , D N ), a random graph is obtained by pairing these half-edges uniformly at random [45, Section 2.2.2]. Let us make this connection explicit.
and, since each out-edge is pointing towards a uniformly chosen vertex, conditional on (D out Suppose that (D out x , D in x ) x are prescribed. We can construct a graph with this given sequence of in and out degrees in the following way: 1. attach to each vertex x ∈ [N ] a number D out x of out half-edges, and a number D in x of in half-edges; and 2. pair each out half-edge to a different in half-edge.
If the pairing in the second step is made uniformly among all possibilities, the resulting random graph is called a directed configuration model with degree sequence (D out x , D in x ) x . In the infection graph it is not hard to see that, again because each out-edge is pointing towards a uniformly chosen vertex, conditional on (D out x , D in x ) x the pairing of in-and out-edges in G N is made uniformly. Furthermore, conditional on (D out x , D in x ) x , the marks (m x ) x are independent, and m x has the distribution of (Z, X ) conditional on | P x | = D out x . We record this connection as a proposition for later use.

Infection process
Conditional on a realization of the infection graph (V N , E N ), we attach an additional independent random variable s e uniform on [0, 1] to every edge e ∈ E N of the graph. This random variable will encode what we will call the contact intensity of edge e. Roughly speaking, if the contact occurs at time t, this contact translates into an infection iff two conditions are satisfied. First, the contact intensity should be strong enough in the sense that s e ≤ c(t) (see (18) below). Secondly, the target individual should not have been infected before (see (19) below). We make this more precise in the next definition; see also Figure 3. ). Assume first that π 1 is the shortest path, but that s e 1 2 > c(|π 1 |) -the edge is grayed out in the figure. Then π 1 is not an active path. Now let us assume that π 2 is active and that |τ 1 π 2 | = a e 2 1 < a e 3 1 + a e * = |τ 2 π 4 |. This means that π 4 cannot be the active geodesic. Finally, if π 2 and π 3 are two active paths and |π 2 | < |π 3 |, then π 2 (in blue) is the active geodesic from I N 0 and σ N x = |π 2 |.
Definition 11 (Active geodesic). Let π = (e 1 , · · · , e n ) be a path with i e 1 ∈ I N 0 . The path is said to be active iff ∀k ≤ n, s e k ≤ c |τ k π| .
For every x / ∈ I N 0 , let Ξ N (x) be the set of active paths from I N 0 to x. The path is said to be the active geodesic from I N 0 to x iff ∀k ≤ n, τ k π = arg min π ∈Ξ N (je k ) |π |.
Finally, we define the infection time of x -denoted by σ N x -as the length of the active geodesic from I N 0 to x, with the convention that σ N x = ∞ if the geodesic does not exist.

Remark 12.
1. Since τ has a density, there is at most one path satisfying the minimization problem (19).

If c ≡ 1, then any path in the infection graph is active, so that our definition
coincides with the usual definition of a geodesic on a geometric graph. In particular, (19) just states that if π = (e 1 , . . . , e n ) is the geodesic from I N 0 to x, then the truncated path τ k π is the geodesic from I N 0 to j e k . Thus, when c ≡ 1, all the information about the infection process is contained in the infection graph and the extra variables s e do not play any role.

The ancestral path
Definition 13 (Infection and ancestral paths).
• Let us consider x of type (S) such that σ N x < ∞ and write π = (e 1 , . . . , e n ) with e k = (i k , j k ) for the active geodesic from Finally, we define the ancestral process as to be the vector recording the information along the chain of infection (age of infection, infection measure, life-cycle).
• If x is of type

Local weak convergence
We introduce the notion of local weak convergence [1,7]. Intuitively, a sequence of graphs converges in the local weak sense if the local structure around a typical vertex (meaning a uniformly chosen vertex) converges in distribution to a random limit. We make this definition precise.
A pointed oriented geometric marked (pogm) graph G is characterized by five coordinates G = (V, E, (a e ), (m x ), ∅), respectively the set of vertices, the set of edges, (a e ) e∈E the lengths of the edges, (m x ) x∈V the set of marks, and ∅ ∈ V the pointed vertex. We let H denote the set of pogm graphs, and equip it with a metric d H so that (H , d H ) is a Polish space. A graph isomorphism φ between two finite pogm graphs G = (V, E, (a e ), (m x ), ∅) and G = (V , E , (a e ), (m x ), ∅ ) is a bijection from V to V such that 2. φ maps the reference vertex of G to the reference vertex in G .
By convention, we set min(∅) = ∞ in the following. Let G = (V, E, (a e ), (m x ), ∅), G = (V , E , (a e ), (m x ), ∅ ) be two elements of H . Define where the minimum is taken over all possible graph isomorphisms between the two graphs (in the sense prescribed above, that is, we only consider the isomorphisms preserving the pointed vertex). If there is no such isomorphism between G and G , we set d(G, G ) = 1.
For G ∈ H and y ∈ G, the topological (or genealogical) distance to the reference vertex x is defined as inf{n : there exists a path (y = x 0 , . . . , x n = x) in G}.
For every r ∈ N * , we denote by [G] r , the subgraph induced by the vertices at a topological distance to the origin, that is, to the pointed vertex, less than r. For two elements G, G ∈ H , we define the (pseudo-)distance d H as follows The metric d H naturally induces a notion of local convergence on (equivalence classes of) H . Using standard arguments, we can see that (H , d H ) is a Polish space.
Given an oriented geometric marked graph G N of size N , for x ∈ [N ] define G N (x) as the subgraph of G N induced by all vertices y with an oriented valid path from y to x (including x itself), where we call a path (y = y 0 , y 1 , . . . , y k , x) valid if and only if for all 1 ≤ i ≤ k, the node y i is of type (S). G N (x) is therefore the graph that contains all potential chains of infection leading to the infection of node x from an initially infected individual.
We treat G N (x) as an element of H , with x as the reference vertex. We can construct a measure on H out of the graph G N by assigning the root x uniformly at random: If the graph G N is random, P (G N ) is a random measure. The following definition is taken from Definition 3.6 in [21].

Definition 14 (Local weak convergence). We say that a sequence of random pogm graphs (G N ) N converges in probability in the local weak sense to a random graph
in probability for the weak topology on measures on H , and where L(G) is the law of G.
We end this section with a direct consequence of the various definitions.

Lemma 15. Consider a metric space E and a functional Φ : H → E.
Suppose that for all N ≥ 1, G N is a random pogm graph of size N , and that (G N ) converges in probability in the local weak sense to some other pogm graph G. If Φ is continuous on a set A ⊆ H such that P(G ∈ A) = 1, then in probability for the weak topology on measures on E, and where L(Φ(G)) is the law of Φ(G).
Proof. For a probability measure P on H , let P • Φ −1 denote the push-forward measure of P by Φ. Clearly According to the continuous mapping theorem, the result is proved if we can show that the mapping P → P • Φ −1 is continuous at P = L(G) for the weak topology. Let P N → L(G) weakly, andG N ∼ P N . Another application of the continuous mapping theorem shows that Φ(G N ) → Φ(G) in distribution, showing the result.

Palm infection measures
Recall that X x = (P x , X x ) is the pair encoding the infection and the life-cycle process and P is a point process where each atom represents a potential infection event. Define |P| := dP(a) which is interpreted as the total number of potential infections (or contacts) along the course of infection. We define a triplet of random variables (W, P , X ) ≡ (W, X ) valued in R + × M × S ≡ R + × X such that for every bounded continuous function f In words, we first bias the pair X = (P, X) by |P|. Conditional on the resulting biased pair X = (P , X ), the r.v. W is obtained by picking an atom of the infection measure P uniformly at random.

Definition 16 (Campbell and Palm measures).
The law of (W, X ) is the Campbell's measure associated to X [3]. The Palm measure at a ∈ R * + is defined as the distribution of the random pair X conditioned on the event {W = a}. We will use the notation X (a) for a random variable with the Palm measure at a. See again [3] for a precise definition of this conditioning.
Recall that τ is the intensity measure of P defined in (1), and that we can write it as τ = R 0 ν, where the total mass R 0 and the probability measure ν are defined in (5). The next result is standard from Palm measure theory.

Lemma 17.
The random variable W is distributed according to ν.

Definition of the Poisson tree
Recall that we have definedτ in (8) bȳ where g is the initial age density of infected individuals, and that we writeτ =R 0ν whereR 0 is the mass ofτ andν the renormalized probability measure, see (9). Let us now consider a pair of random variables (W , Z) ∈ R 2 + with joint density In particular, the first coordinate is distributed according toν.
We now construct a Poisson marked random tree H in two consecutive steps. (This extends the construction of Section 1.4 to the case c ≡ 1.) First, the graph structure of H depends on the two positive real parameters S 0 R 0 , I 0R0 , and second the random edge lengths and the marks are assigned through two probability distributions ν,ν and the Palm measures described in the previous section.
Step 1. Graph structure. The graph structure is given by a Poisson Galton-Watson tree with two types: • Start from a root ∅ of type (S).
• (I) nodes have no offspring.
In the following, let us consider the edges of the tree as being oriented towards the root.
Step • Conditional on (a e , Z i ), the variable X i has the Palm measure X (ae+Z i ) evaluated at a e + Z i .

Remark 18.
• If e = (i, j) with i ∈ (S) then (a e , X i ) has the Campbell measure introduced in Definition 16.
• If i ∈ (I), then a e is distributed according toν.
The random tree H will correspond to the local limit of the pogm graph G N (x) conditioned on {Z x = 0}. Let us now consider the infection process on H introduced in Section 4.2. Conditional on H, we endow each oriented edge e with a uniform random variable s e (the intensity of the contact). As pointed out in Definition 11, those r.v's allow to determine whether a path is active or not and to determine the active geodesic at the root.
Define σ ∞ as the length of the active geodesic in H from the set of (I) leaves to the root ∅. The following key result connects the distribution of σ ∞ to the delay equation.
Proof. As we have assumed that τ has a density w.r.t. the Lebesgue measure, it is clear that this also holds for the distribution of σ ∞ . We denote its density by f . Let K, resp.K, be the number of type (S), resp. type (I), children of the root of H. Let (H 1 , . . . , H K ) denote the subtrees attached to the root ∅ which are growing out of the type (S) children of the root. Let (σ ∞ 1 , . . . , σ ∞ K ) be the corresponding infection times, σ ∞ i being obtained by determining the length of the active geodesic from the vertices of type (I) to the root in the tree H i . Moreover, let (W 1 , . . . , W K ) and (W 1 , . . . ,WK) be the lengths of the edges ending at ∅ and starting from an (S) and an (I) children respectively. (Recall that the edges of the Poisson tree are directed towards the root.) Finally, with a slight abuse of notation, let s i be the contact intensity on the edge with length W i . Lets i be defined analogously. Define By definition of the active geodesic, we have that with the convention 0 × ∞ = 0. Define G(t) = P(σ ∞ > t). Let W andW be distributed according to ν andν respectively. By the branching property, Using these expressions, (22) and the branching property we have where, in the last equality, we have used the generating function of a Poisson distribution. It now follows that B(t) = S 0 (1 − G(t)) satisfies (11).

The infection path conditioned on its length
Let us consider the infection process on H as described in the previous section. For every realization in {σ ∞ < ∞}, define R ∞ to be the infection path from ∅ to the (I) leaves in H, and let A ∞ be the ancestral process defined analogously to Definition 13. In this section, we ask the following question: conditional on the active geodesic to be of length t, what is the distribution of the vector of infection times along the geodesic? In order to give an answer to this question, we start with some definition. Let us consider R ∞ to be the infection path from ∅ to the (I) leaves in H -see Definition 13. Our aim is to provide a description of R ∞ conditional on {σ ∞ = t}. Define the processR (t) ≡R as the R-valued, nonincreasing Markov chain, started from t and stopped upon reaching (−∞, 0], with transition kernel Q(x, y) defined for all x > 0 by where b is extended to the negative half-line with b(−t) := I 0 g(t). The fact that Q defines a transition kernel follows from the renewal equation for b, which is obtained by differentiating (11) with respect to t: k ≤ 0}. In the next proposition, we slightly abuse notation and identifyR (t) with its finitelength restriction to [L].
Proof. Recall that σ ∞ = R ∞ (0) is a random variable valued in R + ∪ {∞}. By Proposition 19, the density of the random variable σ ∞ on R + is given by S −1 0 b(t). Let F be the joint probability density of the random pair (R ∞ (1), R ∞ (0)−R ∞ (1)), and define ∀t and Since H is a Poisson random tree it is sufficient to understand the first step of the infection path, i.e., we need to show that We use the same notation as in the proof of Proposition 19 and we distinguish between two cases.
In this case, the first individual along the geodesic is of type (S). Let us work conditional on (K,K) and compute the density F (x, t − x). Fix a child i ≤ K of type (S) of the root ∅. By construction of the tree H, the active geodesic leading to i and the length of the edge e i going out of i toward the root are independent. Their joint density at (x, t − x) is S −1 0 b(x)ν(t − x) by Proposition 19. For individual i to be part of the active geodesic leading to ∅, the edge e i needs to be active, which occurs with probability c(t), and the shortest active path going through any of the other children of the root must be longer than t. Using the expression (23), the probability of the latter event is Summing over all K children of type (S) yields that where in the second line, we used the fact that K is Poisson(S 0 R 0 ) (so that the size-biased version of K is identical in law to K + 1). In the third line, we used the relation τ (u) = R 0 ν(u). In Proposition 19, we showed that This shows (25).

Case 2:
x ≤ 0. On this event the first vertex along the transmission chain is of type (I). We use the same argument as in the case x > 0. Let i ≤K be a child of ∅ of type (I). Again, for this individual to be in the active geodesic, all paths going from an (I) individual to the root and not going through i need to be longer than t, and the edge from i to ∅ needs to be active. In this case, (21). Thus the density of (R ∞ (1), This together with (23) lead to

Harmonic transform
In this section, we prove that the pathR (t) is the h-transform of a renewal process stopped upon reaching (−∞, 0]. Throughout this section, we assume the existence of a unique Malthusian parameter α ∈ R such that exp(−αa)τ (a) da = 1.
Let (Y i ) be a sequence of i.i.d. random variables with probability density r. Let t > 0 and define the renewal process R (t) ≡ R as follows We couple the renewal process R with a random variable Consider the filtration (F k ; k ≥ 0) where and define the reaching time of (−∞, 0] as L := inf{k : R k ≤ 0}.

Lemma 22. Define
The process (M k ; k ≥ 0) is a martingale with respect to the filtration (F k ; k ≥ 0).
Proof. Let us compute the conditional expectation E(M k+1 | F k ) for a realization on the event A k := {R k > 0, K ≥ k}. The martingale property is obviously satisfied for any realization on the complementary event. Using the renewal equation (24) for b, we have

Proposition 23. Let h(s, u) := b(s)e −αs u and consider the h-transform of the two dimensional process (R, χ). Then the processR is the first coordinate of the h-transformed process.
Proof. On the one hand, the previous lemma implies that h is a harmonic function for the bivariate process (R, χ). On the other hand, the transition kernelQ for the h-transformed process can be rewritten explicitly as It is now straightforward to check thatR is identical in law with the first coordinate of the h-transformed process.
Let P be the law of the bivariate path (R, χ) stopped at L = inf{k : R k ≤ 0}. LetP be the law of h-transform (R,χ) stopped atL = inf{k :R k ≤ 0}. Then P P and the Radon-Nikodym derivative is given by This immediately entails the following result.

Proposition 24.
Assume that g(t) = α exp(−αt). ThenP is obtained by conditioning the renewal process R on not being killed before time L, and b(t) can be written: b(t) = αe αt P (R (t) is not killed before time L).

Remark 25. Consider the linearized version of the Kermack-McKendrick equation
∂ t n(t, a) + ∂ a n(t, a) = 0 We close this section by a brief discussion on the previous result. In [18], we considered a "linearized" version of the present model by making the simplifying assumption that susceptible individuals are always in excess (branching assumption), so that the epidemic is described by a Crump-Mode-Jagers process. When c ≡ 1 and R 0 > 1, the process is supercritical. Starting from a single infected individual, there is a positive probability of non-extinction and conditional on this event, the number of infected grows exponentially at rate α > 0. Further, it is well known from the seminal work of Jagers and Nerman [32] that under mild assumptions, 1. the age structure of the population converges to the exponential profile g(t) = α exp(−αt) mentioned in Proposition 24.
2. the infection path -interpreted as the ancestral line in the work of Jagers and Nerman -is well described by the renewal process R. More precisely, if one sample an infected individual at a large time t, its infection path converges to the renewal process R.
We can draw two conclusions out of those observations. As a consequence of the first item, the age structure g(t) = α exp(−αt) could be interpreted as the age structure emerging from a single infected individual in the past (provided that the initial fraction of infected individuals in our model is small). The second conclusion is that the effect of the conditioning in Proposition 24 encodes the effect of the saturation and the contact rate c on the genealogy. Recall that in the absence of saturation (branching approximation) and full contact rate (c ≡ 1), the infection path is distributed as the renewal process. When those effects are taken into account, Proposition 24 indicates that the law of the infection path is twisted in such a way that infection paths with infection occurring at low susceptible frequency (i.e. low values of S) and high contact rates c are favored. This is consistent with the intuition that ancestral infections tend to be biased towards periods when many infections occurred.

Convergence of the infection graph
We show in this section that the Poisson random tree H constructed previously corresponds to the local weak limit of (S) vertices in the infection graph G N . This entails that the empirical distribution of any continuous functional of the graph in the local topology converges to the law of the corresponding functional for H.
In particular we will deduce our two mains results, the convergence of the age structure and that of the historical process, by viewing the age of an individual x and its transmission chain as functionals of the active geodesic in G N leading to x. The key result of this section is the following.

Proposition 26.
The sequence of infection graphs (G N ) N converges in probability in the local weak sense to a random pogm tree T such that • with probability I 0 , T is made of a single (I) vertex ∅, whose mark (Z ∅ , X ∅ ) is distributed as Z ∅ ∼ g(a)da and X ∅ ∼ (P, X); • with probability S 0 , T is distributed as the random tree H of Section 5.2.
In other words, the tree H constructed in Section 5.2 corresponds to the law of T , conditioned on starting from an (S) vertex.

Lemma 27.
For each N , let (X N i ; i ≤ N ) be some exchangeable random variables in some Polish state space, and (X 1 , X 2 ) be two independent random variables with distribution L(X). Then where the two convergence are in distribution.
Proof. By exchangeability, for any continuous bounded φ, ψ, If the random measure converges, then by dominated convergence showing that the pair (X N 1 , X N 2 ) converges in distribution to (X 1 , X 2 ). Conversely, using again (26), the convergence of (X N 1 , X N 2 ) entails that These two estimates prove that 1 ], which in turn shows that the measure 1 converges to L(X), see for instance [29,Theorem 4.11].
Proof of Proposition 26. We prove the result in three steps. First, we show the local weak convergence of the graph structure (without the marking) towards a limiting Galton-Watson tree T . We make use of known results on the local weak convergence of configuration models. Then we show that (G N ) N (with the marking) converges to the tree T obtained by marking T appropriately. Finally we prove that the law of the limiting tree T , conditional on starting from an (S) vertex and after removing all edges pointing towards an (I) vertex, is distributed as the Poisson tree H.
Step 1. Recall that the infection graph G N can be constructed as a directed configuration model, see the notation in Proposition 10. We will use the known fact that the local weak limit of a configuration model is a Galton-Watson tree [45, Section 2.2.2]. We make use of a version of this result for directed graphs derived in [21,Proposition 6.2].
The local weak convergence in [21] is derived for a different class of oriented graphs than the pogm graphs introduced in this work. Namely, edges have no lengths and the vertices are marked with their out-degrees. Accordingly, let us denote byG N the oriented marked graph obtained by replacing the marks (m x ) x = (Z x , X x ) x by the mark (m x ) x = (| P x |) x and removing the edge lengths. Recall the notation (D out x ) x and (D in x ) x for the collection of in and out degree inG N . Three conditions need to be checked on this degree sequence to obtain the local weak convergence ofG N , see Condition 6.1 of [21], (a) for any positive bounded function φ, in probability, in probability (note that we have removed a 1/N factor compared to [21] that should not appear); for some random pair (D out , D in ), and where (D out , D in ) is obtained by sizebiasing (D out , D in ) by its first coordinate.
We check condition (a) by computing the second moment of the empirical distribution of degrees. Since the in-degrees follow the multinomial distribution (17), we have that in probability, and combining this with point (b) we have in probability. This shows (c) for our specific choice of φ. For a general φ, up to extracting a subsequence, let us assume that a.s. (27) holds, for all k, k ≥ 0. Scheffé's lemma shows that this pointwise convergence can be reinforced to a convergence in 1 (N × N), which readily entails (c). Therefore, Proposition 6.2 in [21] shows thatG N converges in probability in the local weak sense towards a marked Galton-Watson tree T where each vertex u has: 1. a Poisson(S 0 R 0 + I 0R0 ) number of offspring (with edges oriented from the children towards the parents); and 2. an independent markm u distributed as | P| for the root and as the sizebiasing of | P| for other vertices.
Note that in this tree there is no distinction between (I) and (S) vertices since part of the marking has been removed.
Step 2. We now show that G N (with the full marking) converges to a tree obtained by marking the limiting Galton-Watson tree as in Furthermore the lengths of the edges going out of u are sampled uniformly among the atoms of P u . Now, the first part of the proof and Lemma 27 show that (G N (x),G N (y)) converges in distribution to two independent copies ( T 1 , T 2 ) of the limit Galton-Watson tree. Provided that P(B N r ) → 1, this shows that in distribution where the tree T i is obtained out of T i by adding marks and edge lengths as in (28) and removing edges pointing to an (I) vertex. In turn, Lemma 27 proves that G N converges to T 1 in probability in the local weak sense. It remains to show that P(B N r ) → 1. This result is actually shown as a step in the proof of Proposition 6.2 from [21] that we have used in our Step 1. More precisely, the proof of [21,Lemma 6.4] shows that, with probability going to 1, the balls of radius r of two uniformly chosen vertices in the directed configuration model do not intersect, which is the result we need here. Let us explain heuristically why we expect this result to hold. The ball [G N (x)] r can be constructed by exploring the graph starting from x, following the in-edges in reverse direction, and pairing them with out-edges. Each time a new in-edge is explored, it is paired with an out-edge chosen uniformly from the unpaired out-edges in the graph. Since the total number of edges explored in [G N (x)] r and [G N (y)] r is negligible w.r.t. the total number of edges in G N (and since no vertex in G N has a number of out-edges of order N ) the probability that the same vertex is explored both in [G N (x)] r and [G N (y)] r vanishes as N → ∞. This argument is made rigorous in the proof of [21,Lemma 6.4].
Step 3. Let T be distributed as the local weak limit of G N from the previous step. Our last task is to connect the distribution of T to that of the Poisson tree H from Section 5.2. Let us first take care of the root ∅. By definition of T ,m ∅ ∼ | P| and conditional onm ∅ , m ∅ ∼ (Z, X ) conditioned on | P| =m ∅ . This readily shows that the mark of the root is distributed as (Z, X ), so that in particular it is of type (I) and (S) with probability I 0 and S 0 respectively.
We now turn to some non-root vertex u ∈ T . Recall that its markm u has the size-biased distribution of | P| and that m u = (Z u , X u ) is obtained as in (28). Let A u be the length of its unique out-edge, which is uniformly chosen among the atoms of P u . We have where in the first line we have used (28) and that A u is a uniform atom of P u , in the second line thatm u has the size-biased distribution of P and in the third line the definition of P of (15). The result now follows upon identifying the terms.
The prefactor in each term of the sums corresponds to the probability that Z u = 0 or Z u > 0, that is, that vertex u is of type (S) or (I). Since the total number of offspring in T follows a Poisson distribution with parameter S 0 R 0 + I 0R0 , the number of (S) and (I) offspring are independent Poisson random variables with means S 0 R 0 and I 0R0 respectively. Moreover where (W, X ) has the Campbell measure of Definition 16. Thus any (S) individual in T has an edge length and mark distributed as (W, X ) as in H. Similarly, where X (a) has the Palm distribution of X at a, and G is the probability density defined in (21). In the second line we have used the definition of the Palm measure. Identifying the terms, the mark of an (I) individual is obtained as that defined for H.

Convergence of the historical process
We can now state and prove our main result. Let us introduce the historical process as the following empirical measure We also define the historical process at time t ≥ 0 as the historical process of all individuals infected before time t, Theorem 28 (Convergence of the historical process). Let A ∞ be the limiting ancestral process in the Poisson tree H and let (σ 0 , X ) denote a pair of independent random variables where −σ 0 is distributed according to the density g.
(i) For any t ≥ 0 we have is the law of the random variable A ∞ conditioned on the event {σ ∞ ≤ t}, and the convergence is in distribution for the weak topology.
(ii) If (c(t); t ≥ 0) converges as t → ∞ we have that in distribution for the weak topology.
The convergence result in (ii) is stronger than that in (i), but requires the mild assumption that the contact rate converges. Point (i) of the previous result is sufficient to derive the limit of the age structure of the epidemic, our Theorem 1. However, it is not sufficient to prove that the total number of individuals infected during the epidemic converges. This is a very well-studied quantity in epidemic modeling, referred to as the final size of the epidemic [2,31], and our motivation for deriving point (ii) is the following corollary.
Proof. By Theorem 28, point (ii), we have that 1 By Proposition 19, To prove the convergence of the historical process, we see the ancestral process A N x as a functional of the pogm graph G N (x) rooted at x. Provided we can show that the mapping taking a pogm graph to its active geodesic enjoys some appropriate continuity, the convergence of the historical process will follow from the local weak convergence of the infection graph G N .
For a deterministic pogm graph G, we can define an infection process by attaching to each edge e a uniform infection intensity s e which determines if the edge is active or not, as in Section 4.2. It will be convenient to work conditional on (s e ) and to think of these infection intensities as a marking of the edges of the graph. It is straightforward to extend the definitions and results from Section 4.4 to include this marking, and that the convergence of the infection graph in Proposition 26 also remains valid for this extended marking: the infection graph G N , marked with uniform infection intensities, converges in the local weak sense to the tree T , also marked with uniform infection intensities.
For a pogm graph with fixed infection intensities, G, we can define A(G) as the ancestral process of G, which records the infection times along the active geodesic leading to the pointed vertex, as defined in Section 4.3. We also define σ(G) as the length of the corresponding active geodesic. We can now prove that the ancestral process is a continuous functional of the local graph topology.
Lemma 30. Let f be a continuous bounded functional on the space of ancestral paths. Then for any t > 0 the map is continuous at a.e. realization G of the tree T . If the function (c(t); t ≥ 0) converges as t → ∞, then f is continuous at a.e. realization of A(T ).
Proof. The tree T is either made of a singled (I) vertex, or is a copy of H. Clearly, in the former case the result holds so that it remains to show it for almost every realization of H. For some pogm tree G, if d is the genealogical distance and π v denotes the unique path from v to the root, let be the length of the shortest path from a vertex at distance r to the root. We start by showing in Step 1 that, almost surely, either H is finite, or Then, in Step 2, we show that (30) is continuous for almost all graphs G verifying this property. Under the additional assumption that (c(t); t ≥ 0) converges, we prove that f is continuous in Step 3 and Step 4.
Step 1. Let (V r ; r ≥ 0) be the process that records the ages of the (S) vertices of the Poisson tree H, defined as Then (V r ) r≥0 is a branching random walk with Poisson(R 0 S 0 ) offspring distribution, and it follows from general results that, conditional on non-extinction, its minimum drifts to ∞, see for instance Theorem 5.12 in [38]. As H is obtained by attaching independently to any unmarked vertex a Poisson(I 0R0 ) distributed number of (I) leaves, this shows that (31) also holds. This completes Step 1.
Step 2. First let us note that the marks (s e ), representing the infection intensities, are independent of the structure of the tree T and the lengths of its edges, so T almost surely satisfies the following property, for all r ∈ N: where A pot r is the (a.s. finite) set of lengths of all paths from (I) vertices at distance at most r from the root to (S) vertices. If a tree G satisfies this property, then it is clear that for any sequence G N → G, for N large enough, the r-neighborhood of the root in G N has the same structure as that of G and in this neighborhood, a path from an (I

) vertex to an (S) vertex is open in G N if and only if it is open in G.
Fix some tree G satisfying (32), which is either finite or fulfills (31), and a sequence G N converging to G in H . We need to prove that It is readily checked that (33) holds if G is a finite tree. Suppose that G is infinite. If σ(G) < ∞, let r be such that M r (G) > σ(G). In particular there is an active path from an (I) vertex in [G] r to the root. The convergence [G N ] r → [G] r entails that, for N large enough, there is also an active path from an (I) vertex in [G N ] r to its root whose length converges to σ(G), and that all other active paths from G N \[G N ] r to the root have a length larger than σ(G) and thus cannot be the active geodesic. We are back to the case of a finite tree where (33) is readily checked. Finally, if σ(G) = ∞, fix r such that M r (G) > t. The convergence [G N ] r → [G] r now entails that, for N large enough, there is no active path from an (I) vertex in [G N ] r to the root, so that σ(G N ) > t. This shows that in all three cases (33) holds and proves the first part of the result. We have also shown that f is continuous at every G that fulfills (31) and has an active geodesic, and that if G N → G where G fulfills (31) and has no active geodesic, necessarily σ(G N ) → ∞.
We now prove the second part of the result and assume that (c(t); t ≥ 0) converges to a limit c * as t → ∞. For a pogm graph G with infection intensities on its edges, we denote by G s the pogm graph obtained by removing from G all edges e with an infection intensity s e > s. We proceed again in two steps. In Step 3 we show that if H c * is infinite, then H has a.s. a geodesic. In Step 4 we consider a pogm graph G such that G c * is finite, and prove that if G N → G then σ(G N )1 {σ(G N )<∞} is bounded. Before moving to the proof of these two claims, let us show that they are sufficient to prove our result. If G is such that (31) holds and has an active geodesic, by Step 2 f is continuous at G. Therefore, by Step 3, f is continuous at a.e. realization G of H such that G c * is infinite, or such that G c * is finite but G has an active path to the root. It remains to consider the case where G c * is finite and G has no active path. If G N → G, by Step 4 σ(G N )1 {σ(G N )<∞} is bounded and by Step 2 σ(G N ) → ∞. Necessarily, σ(G N ) = ∞ and f (A(G N )) = f (∅) = f (A(G)) for N large enough, showing that f is continuous at G. Our only remaining task is now to show the previous two claims.
Step 3. We show that a.e. realization of H such that H c * is infinite has an active geodesic. There are two trivial cases that we easily exclude: if c * = 0, H c * cannot be infinite, and if σ ∞ has bounded support, our result is trivial because the epidemic stops a.s. after a finite time. Now, for any s, by standard properties of Poisson random variables, the graph H s is again a Galton-Watson tree with Poisson distributed offspring and the graph H is obtained by grafting independently on each (S) vertex of H s : • a Poisson(S 0 R 0 (1 − s)) distributed number of copies of H; and • a Poisson(I 0R0 (1 − s)) distributed number of (I) vertices.
Furthermore, each of these trees is connected to H s through a unique edge whose infection intensity is uniform on the interval (s, 1). Note that when s increases, so does the number of edges in H s , therefore we have {H c * − is infinite} ⊂ {H c * is infinite} for each > 0. Furthermore, by studying the extinction probability of these Galton-Watson trees, we readily see that the probability P(H s is infinite) is a continuous function of s, which implies that In other words, up to a null probability event, we have

>0
{H c * − is infinite} = {H c * is infinite}, therefore without loss of generality, we can can consider a realization of H and an > 0 such that H c * − is infinite. Now let T be such that |c(t)−c * | < /2 for t ≥ T .
On the event that H c * − is infinite, a.s. we can find a subtree G of H grafted on H c * − such that σ(G) > T and such that the edge connecting G to H c * − has an infection intensity in (c * − , c * − /2). (We have used that σ ∞ has unbounded support.) Let us write e 1 for this edge and let π = (e 1 , . . . , e n ) be the unique path in H leading to the root and extending the active geodesic in G. Since σ(G) > T , for each edge e k of π, we have |τ k (e k )| > T , so that c(|τ k (e k )|) > c * − /2 and the edge is open. Therefore there exists a.s. an active path in H if H c * − is infinite.
Step 4. We now consider a pogm tree G such that G c * is finite and G has no active path. If G N → G, we need to show that for N large enough σ(G N ) = ∞, that is, that G N has no active path. Since G c * is finite, there exists r such that [G c * ] r = G c * . By definition of G c * , all edges e in G pointing to a leaf vertex of G c * have an infection intensity s e > c * . Since [G N ] r+1 → [G] r+1 , the same holds true for G N for N large enough. If π N is an active path leading from an (I) vertex to the root in G N , by Step 2, it has to satisfy |π N | → ∞. However, any path π N = (e N 1 , . . . , e N n ) ending at the root in G N with |π N | → ∞ includes an edge e N k pointing to a leaf vertex of G c * . On one hand, since |π N | → ∞ we have c(|τ k π N |) → c * . On the other hand, since [G N ] r+1 → [G] r+1 , we have lim inf N →∞ s e N k > c * , so that this path is not active for large enough N , proving that there exists no such active paths. in probability for the weak topology (see for instance [29,Theorem 4.11]). The proof of the first point is ended by noting that, with probability I 0 we have A(T ) = (−Z, X ), whereas with probability S 0 we have T = H. The proof of point (ii) is the same as for point (i), but replacing the map G → f (A(G))1 {σ(G)<t} by the map G → f (A(G)) and using the second part of Lemma 30 for the continuity.
We can now prove Theorem 1 using Theorem 28. Recall the notation for the empirical distribution of ages and compartments at time t, and the notation for the number of individuals in compartment i at time t. Note that µ N t = µ N t (da, di) can be written in terms of H N as follows where π = (σ , X ) k =0 = (σ , (P , X )) k =0 denotes a generic ancestral path.
Proof of Theorem 1. By Theorem 28 and (35), we get for fixed t and i ∈ S, where σ ∞ is the length of the active geodesic in H, and X is a life-cycle process. Using Proposition 19, we can further identify S 0 P(t − σ ∞ ∈ da) = n(t, a) da, proving finite-dimensional convergence of (µ N t /N ; t ≥ 0). Because of the expressions of Y N t (i) in terms of µ N t in (34), identification of their limit is trivial. All there is to check is tightness of the processes.
The tightness for (µ N t /N ; t ≥ 0) will follow from that of (Y N t (i)/N ; t ≥ 0). Recall that the compartments of the life-cycle process enjoy an "acyclic orientation" property. See statement before Theorem 1. Writing i j if j can be accessed from i, the process is nondecreasing in time. Since the finite-dimensional marginals of this nondecreasing process converge to the expression on the RHS of (4), tightness follows provided we can show that this limit is continuous, see for instance Theorem 3.37, Chapter VI of [24]. For the continuity, for t ≥ 0 and h > 0, using Hölder inequality  The first term can be made small using for instance (12), whereas for the second term one can use that translation operators are continuous on L 1 . We proceed in a similar way for h < 0. The tightness of Y N t (i)/N follows by subtracting the processes in (36) in an appropriate way.
Let us turn to the tightness of (µ N t /N ; t ≥ 0). We will use a tightness criterion for measure-valued processes in [44]. This criterion is stated for measures on a compact space, but can be easily adapted to our setting by considering a compactification of R + × S and noting that the limit of our sequence (4) has no mass at infinity. According to Lemma 3.2 in [44], it is sufficient to check tightness of the processes (Y N t (φ, i)/N ; t ≥ 0), for φ : R + → R uniformly continuous and where For s < t, we have Tightness follows from the uniform continuity of φ and from the tightness of the sequence (Y N t (i)/N ; t ≥ 0).