Estimation of the infection parameter of an epidemic modeled by a branching process

: The aim of this paper is to build estimators of the infection pa- rameter in the diﬀerent phases of an epidemic (growth and decay phases). The epidemic is modeled by a Markovian process of order d > 1, and can be written as a multitype branching process. We propose three estimators suitable for the diﬀerent classes of criticality of the process, and conse- quently for diﬀerent phases of the epidemic. We prove their consistency and asymptotic normality when the number of ancestors (resp. number of generations) tends to inﬁnity.


Introduction
The purpose of this paper is to quantify the infection of an epidemic in its different phases (growth and decay) by providing appropriate estimators of the infection parameter for each of these phases. The epidemic is modeled by a Markovian process of order d 1 with Poissonian transitions, which can be seen as a multitype Bienaymé-Galton-Watson (BGW) branching process with d types. This model is suitable for any rare transmissible SEIR disease in a large branching population following a Reed-Frost model for the infection ( [11]). The process corresponds to the incidence of clinical cases (individuals or animals with clinical symptoms), which are assumed to be the only available observations. An epidemic phase is here defined as a phase for which the transmission mechanism of the disease remains unchanged, in particular for which the infection parameter remains constant. In our setting, this results in an increase or a decrease of the clinical cases incidence, and we refer to the corresponding phases as growth or decay phases.
Branching processes are useful models to describe the extinction and growth of populations, and as such have been applied to many biological problems (see e.g. [7]). The estimation of a key parameter such as the mean number of offspring or the Perron's root (largest eigenvalue of the mean matrix), which determines the asymptotic average growth rate of the population and whether or not extinction is certain, is consequently of a very large interest. Since the parameter quantifying the infection of the epidemic is, in our model, an explicit function of the Perron's root (see (3.5)), it is very natural either to build a new estimator specifically designed for the model, or to investigate the existing results in the estimation theory of the Perron's root. As detailed in Section 3, estimators of the Perron's root for general multitype branching processes usually require the knowledge of the whole or partial genealogy of the process (for example individual offspring sizes, or parent-offspring type combination counts), which are data that are mostly non available in the epidemiological context. Asmussen and Keiding, however, introduced in [1] an explicit estimator based only on the total generation sizes, corresponding in our model to the incidence of cases at each time, assumed to be the only available observations. We deduce from this estimator a first estimator of the infection parameter. Despite the potentially large order of the Markovian process that we consider, its Poissonian character ensures many properties which make it easy to derive estimators with interesting characteristics. We thus build two conditional least squares estimators (CLSE) based either on the chosen process or on the process conditioned on non-extinction at each time step, and finally compare these three estimators.
After presenting the epidemic model as well as the underlying general mathematical process in Section 2, we provide in Section 3 the three estimators of the infection parameter, which are all only based on the available observations. We aim at asymptotic results, either as the size of the initial population tends to infinity, or as the number of observations tends to infinity. It could be of a great mathematical interest to study the asymptotic behavior when both the initial size and the number of observations simultaneously tend to infinity, as it is done in [6] for the single-type case, but we choose to focus on asymptotic properties of an immediate practical interest. We first present in Section 3.1 an explicit estimator θ derived from the estimator of the Perron's root introduced in [1], and in Section 3.2 a CLSE θ, which are both consistent and asymptotically normal in the supercritical case, on the set of non-extinction, as the number of observations tends to infinity. These two estimators are thus especially suitable for a long growth phase of the epidemic. We prove moreover in Section 3.2 that the CLSE θ is consistent and asymptotically normal, as the initial population size grows to infinity, no matter the class of criticality. This estimator is thus also appropriate for any phase of the epidemic, provided that the number of clinical cases at the beginning of the phase is large. We then focus in Section 3.3 on a CLSEθ based on the process conditioned on its non-extinction at each time step, which is consistent and asymptotically normal in the subcritical case, as the number of observations tends to infinity. This estimator is thus particularly designed for a long decay phase of the epidemic. We finally proceed in Section 4 to a numerical comparison of these estimators, focusing especially on the comparison of θ against θ in a growth phase, and ofθ against θ in a decay phase. We describe the practical interest of each of these estimators, and illustrate on a simulated multi-phase epidemic which one should be chosen on which type of data.

The epidemic model
Throughout this paper we consider the following Markovian process of order d 1, with deterministic initial values X −d+1 = x −d+1 , . . . , X 0 = x 0 : where the {Y n−k,n,i } i,k are independent given F n−1 := σ({X n−k } k 1 ), and the {Y n−k,n,i } i are independent and identically distributed (i.i.d.) given F n−1 , following a Poisson distribution with some parameter Ψ k > 0 independent of n. Denoting by X n the incidence of infectives at time n, and assuming that an infective can transmit the disease during one time unit at most, then X n also corresponds to the incidence of cases at time n, which are usually the only available observations. In accordance to the usual terminology for branching process, we refer to X n as the size of the generation n. The Markovian process (2.1) describes in a recursive way how one single infective indirectly generates new infectives (so-called "secondary cases") k time units later, for 1 k d. In (2.1), the variable Y n−k,n,i then corresponds to the incidence of secondary cases produced at time n with a delay k (latent period) by individual i infectious at time n − k. We refer to [11] for a justification of this modelling in the context of a rare infectious disease of the SEIR kind.
Let us first point out that in the simple case d = 1, the process is a singletype BGW branching process with a Poisson offspring distribution. For any information on single-type or multitype branching processes, we refer to [2]. Let d 1. We easily derive from (2.1) the conditional law of X n , where D = denotes the equality in distribution. Moreover, X n may be written as a multitype BGW process. For this purpose, we define the d-dimensional process X n := (X n,1 , . . . , X n,d ) such that, for all i = 1 . . . d, X n,i := X n−i+1 , namely X n = (X n , X n−1 , . . . , X n−d+1 ). In particular, the first coordinate of X n corresponds to the value of the one-dimensional process (X n ) n at time n. Then (X n ) n is a d-type BGW process, where the types correspond to the memory of the one-dimensional process.
Let us denote by f = (f 1 , . . . , f d ) its offspring generating function defined (1−r1) . Denoting by M its mean matrix defined by its entries The branching process (X n ) n is nonsingular and positive regular. Moreover, it can be easily shown that it satisfies the X log X assumption, that is for each Let us denote by ρ the Perron's root of the mean matrix M, and let ξ and η be the right and left eigenvectors of M for the eigenvalue ρ, with normalization ξ · 1 = ξ · η = 1, where u · v denotes the scalar product d k=1 u k v k . We compute that for all i = 1 . . . d, By the Perron-Frobenius theorem, the Perron's root ρ provides information about the asymptotic behavior of the powers of M, and thus about the expected incidence of cases at time n, for n large. Given the initial values X −d+1 , . . . , X 0 , the expected number of new cases satisfies The asymptotic growth or decay of the average number of cases thus depends on the sign of ρ − 1. We call the process (X n ) n critical (resp. subcritical, supercritical) if ρ = 1 (resp. < 1, > 1). By the theory of multitype positive regular and nonsingular BGW processes (see e.g. [2], Theorem 5.3.2), the extinction of the process (X n ) n occurs almost surely if and only if ρ 1. In the supercritical case ρ > 1, there exists a real-valued random variable W such that lim n→∞ ρ −n X n a.s.
Moreover, (2.3) implies that P(W > 0) > 0 ( [2], Theorem 5.6.1). As we will see in the next section, the class of criticality of the process can play a role in the choice of the estimation method. This can be problematic if the range of ρ, which has no explicit form, is unknown. However, a computation of det (M − ρI) shows that ρ is solution of the equation which implies that (X n ) n is critical (resp. subcritical, supercritical) if d k=1 Ψ k = 1 (resp. < 1, > 1). If the order of magnitude of the parameters Ψ k is known, we thus obtain an alternative and more handy criticality criteria.

Estimation of the infection parameter
From now on we assume that d is fixed, and that the Ψ k 's affinely depend on some unknown parameter θ 0 , that is for all k = 1 . . . d, where a k > 0 and b k 0 are known. In what follows we will refer to θ 0 as the infection parameter (see Remark 3.1). We shall denote by a, b and Ψ(θ 0 ) the d-dimensional vectors with coordinates a k , b k and Ψ k (θ 0 ) respectively. From Section 2 it immediately follows that (X n ) n is critical (resp. subcritical, supercritical) if Remark 3.1. The assumption (3.1) is for instance relevant in the epidemiological context of rare SEIR diseases in large populations. Indeed, in this case (see [11]), the process of cases incidences (X n ) n is of the form (2.1), where d + 1 is the largest survival age in the whole population, and for each k = 1 . . . d, The unknown parameter in Ψ k would commonly be the horizontal infection parameter θ 0 , which is the mean number of individuals infected by an infective per time unit by horizontal route. The other parameters involved, that is L a,k (probability for an individual aged a at infection to have a latent period equal to k, given his survival), S a (probability to be aged a at the end of the latent period) and p (vertical infection parameter, representing the maternal transmission probability), can be reasonably assumed to be known from previous experiences or from the study of a previous phase of the epidemic. However, the infection parameter θ 0 is often unknown and is likely to vary. Indeed, it strongly depends on the environment which might have been modified by control or sanitary measures during the course of the epidemic. Estimating this parameter then not only provides information about the efficiency of the new measures, but also enables predictions of the future spread of the disease under the same measures.
Assuming (3.1), our aim is to provide estimators of θ 0 based on the observations (X 0 , . . . , X n ), with asymptotic properties corresponding to interesting characteristics in the epidemiological context. We are thus looking for estimators suitable in the subcritical and/or supercritical cases, with asymptotic properties, as the initial population size grows to infinity, or as the number of observations n tends to infinity. We would thus entirely cover the problem of estimating the infection parameter in the growth and extinction phases of the epidemic, offering moreover several alternatives depending on which asymptotic assumption is suitable regarding the available data.
There are numerous results in the literature dedicated to the estimation theory for general branching processes. In its early paper [8] in 1948, Harris provided an estimator for the mean value m 0 of a single-type BGW process X 0 , . . . , X n . It is a maximum likelihood estimator (MLE) based on observed values of the individual offspring size for each individual at each time: which is consistent as n → ∞ in the supercritical case, on the set of nonextinction. Note that m MLE only involves X 0 , . . . , X n . It is actually proved in [5] that it is also the MLE of m 0 based on the observed values of X 0 , . . . , X n only. It is straightforward to show that m MLE is also the weighted conditional least squares estimator (CLSE) based on the process X k / X k−1 , defined as follows Similar estimation problems are considered in the multitype case. In [1], Asmussen and Keiding proposed a maximum likelihood estimator of the Perron's root ρ 0 based on the observations of the whole genealogy of the population (i.e. each offspring vector produced by every individual). It is proved in [10] that this estimator is also the maximum likelihood estimator solely based on the observations at each time of the total number of individuals of type j whose parents were of type i, for every i, j = 1 . . . d. However in epidemiology this kind of variables are generally not observable. For our model this would imply indeed that, considering the number of cases at a given time, we could say how many of them were infected exactly k time-units earlier.
We are thus more interested in estimations based on the number of cases at each time, such as the other estimator presented in [1], where |.| denotes the L 1 -norm |X k | := X k,1 + · · · + X k,d . For d = 1, ρ clearly reduces to the Harris estimator defined in (3.3). Note that relation (2.5) implies that (3.5) Hence an estimation of ρ 0 would provide an estimation of θ 0 (the opposite is not true since ρ 0 cannot in general be expressed as an explicit function of θ 0 ). In the supercritical case, the estimator ρ was shown to be consistent, as n → ∞, on the set of non-extinction, with an explicit asymptotic distribution ( [1]). Due to the Poissonian character of the transitions of the process (X n ) n 0 , it is possible, in our setting, to express the joint probability function of the observations X 0 , . . . , X n , without involving the whole or partial genealogy of the process. The likelihood function is indeed given by two factors, one of which is independent of θ 0 , the logarithm of the other being L (θ 0 ) : This equation has in general no explicit solution, except for simple cases such as the one-dimensional case d = 1, for which the MLE is (3.6) As shown later (see (3.11)), it corresponds in these cases to the CLSE of θ 0 . It is however in general not the case, and we choose to focus on the CLSE. In Section 3.1 we derive from the estimator (3.4) a first estimator of θ 0 , and deduce from [1] asymptotic properties of θ, as n → ∞, in the supercritical case, on the set of non-extinction. In a second instance, we study in Section 3.2 the weighted CLSE which has a closed-form expression given by (3.11). We show its asymptotic properties, on the one hand as n → ∞ in the supercritical case on the set of non-extinction, and on the other hand as |x 0 | → ∞ for any class of criticality, where |x 0 | = x −d+1 + · · · + x 0 is the deterministic initial population size. Finally, since we also aim at finding an estimator with asymptotic properties, as n → ∞, holding in the subcritical case, we consider in Section 3.3 the CLSE associated with a Markov chain ( The practical interest and efficiency of these three estimators will be discussed and compared in Section 4.

A closed-form estimator with asymptotic properties, as n → ∞
The aim of this section is to provide an estimator with asymptotic properties, as the number of observations tends to infinity, in the supercritical case, which in the epidemiological context would correspond to the growth phase of the epidemic. For this purpose, we deduce from relation (3.5) the following explicit estimator of θ 0 , where ρ is the estimator of ρ 0 introduced in [1] and given by (3.4). Note that what follows can be applied to any process of the form (2.1), where the {Y n−k,n,i } i do not necessarily follow a Poisson distribution, but satisfy the relation E θ0 (Y n−k,n,i |F n−1 ) = Ψ k (θ 0 ). Let us denote by W 0 the limiting random variable introduced in (2.4). First, as pointed out by Becker in [3], the estimator ρ is strongly consistent on the set of non-extinction {W 0 > 0}. Second, as proven by Asmussen and Keiding in [1], once adequately normalized, ρ − ρ 0 is asymptotically normal. However, the asymptotic behavior of ρ − ρ 0 depends qualitatively on the relative sizes of ρ 0 and λ 2 , where λ is the modulus of a certain eigenvalue of M(θ 0 ). For the sake of clarity, we need to recall the constants introduced in [1] to study the asymptotic properties of ρ, which play a role in the study of θ as well. Let ..s be the spectrum of M(θ 0 ), and for each i = 1 . . . s, let r i be the algebraic multiplicity of λ i . We denote by B = {u i,j , i = 1 . . . s, j = 1 . . . r i } the base of the Jordan canonical decomposition of M(θ 0 ). Let us define the vector ζ := 1 − |η 0 |ξ 0 , where ξ 0 is the eigenvector of M(θ 0 ) defined in Section 2. We denote (ζ i,j ) i=1...s,j=1...ri its coordinates in B. Then λ is defined as follows, We similarly define λ(x) and γ(x) for any complex vector x ∈ C d . As detailed in [1], ρ − ρ 0 = (S n + T n ) (|X 0 | + · · · + |X n−1 |) −1 , where (to avoid heavy notation, when no confusion is possible, we do not write differently column and row vectors when multiplied by a matrix) It appears that S n and T n are of the same order of magnitude when In particular, , and is null otherwise. In order to deal with the case λ 2 < ρ 0 , we define for all n ∈ N, ν n := 1 + n−1 k=0 M(θ 0 ) k κ, and If λ 2 ρ 0 , then there exist vectors ζ 1 and ζ 2 such that (M(θ 0 ) − ρ 0 I) ζ = (M(θ 0 ) − I) ζ 1 + ζ 2 , with λ(ζ 1 ) = λ, γ(ζ 1 ) = γ and λ(ζ 2 ) 1. If λ 2 = ρ 0 , we set moreover We finally define the constant We recall that (2.3) implies P θ0 (W 0 > 0) > 0. For notational convenience, it is assumed in this section just as in [1] that P θ0 (W 0 = 0) = 0. The following results are thus valid on the set of non-extinction. The proof of Theorem 3.2 is postponed in Appendix A. and has the following asymptotic distribution.
Unlike the strong consistency, the second part of the theorem is seldom of direct practical applicability, in particular because of the differentiation between the three cases λ 2 < ρ 0 , λ 2 = ρ 0 and λ 2 > ρ 0 . We cannot provide here an asymptotic confidence interval solely based on the observations, as we can do for the estimators θ andθ (see Theorem 3.6 and Theorem 3.8).
Remark 3.4. If d > 1, the estimator θ seems to have no interesting asymptotic properties as |x 0 | → ∞, for n fixed, unless we assume that for all i = 1 . . . d, If this holds, then for any class of criticality, lim |x0| θ = θ 0 almost surely. This assumption is however very restrictive. Nevertheless, if d = 1 then θ reduces to the CLSE θ studied in Section 3.2 with asymptotic properties as |x 0 | → ∞.

3.2.
A closed-form CLSE with asymptotic properties, as n → ∞ and as |x 0 | → ∞ In this section, we provide an estimator with asymptotic properties as the number of observations n or as the initial (deterministic) population size |x 0 | tends to infinity. Note that this estimator was already introduced in [11], but only the limit in |x 0 | was investigated. We recall these known results in Theorem 3.6, and present the additional limit results as n tends to infinity in Theorem 3.5, whose proof is postponed in Appendix B. Please observe that the latter theorem is only valid in the supercritical case, while Theorem 3.6 holds for any class of criticality. We thus present here an estimator which is on the one hand suitable for a long growth phase of an epidemic (Theorem 3.5), just as θ, and is on the other hand suitable for any phase with a large initial population size (Theorem 3.6). We consider the weighted CLSE based on the process Y k := X k / a · X k−1 , where Θ := ]θ min , θ max [ with θ max > θ min > 0. We easily derive the following explicit form Note that the normalization of the process X k by a · X k−1 generalizes the normalization X k / aX k−1 in the monotype case, leading to the Harris estimator of m 0 = aθ 0 +b. Moreover, θ also corresponds to the MLE of θ 0 if b = 0 (see (3.6)).
In addition, the conditional variance of the error term in the stochastic regression equation for the normalized process is invariant under multiplication of the whole process, and bounded respectively to (X k ) k 0 .
Theorem 3.5. Let us assume that the process (X k ) k 0 is supercritical. Then, on the set of non-extinction, the estimator θ is strongly consistent: and is asymptotically normally distributed: Remark 3.7. We point out that in the proofs of Theorem 3.5 and Theorem 3.6 we do not use the Poissonian character of the transitions of the process (2.1) to derive the properties of θ, but we simply need its first and second order moments. The results of these theorems can thus be applied to any process of the form (2.1), where the {Y n−k,n,i } i do not necessarily follow a Poisson distribution, but satisfy E θ0 (Y n−k,n,i |F n−1 ) = Ψ k (θ 0 ). For Theorem 3.6, the variance should be either known, or previously estimated, and the process should be normalized accordingly such that the error term in the stochastic regression equation remains bounded.

A CLSE with asymptotic properties, as n → ∞
In this section we study a CLSE associated with a conditioned process, and obtain asymptotic properties as the number of observations n tends to infinity in the subcritical case, despite the almost sure extinction of the original process (X k ) k 0 . We point out that this estimator is particularly adapted for the study of the extinction phase of an epidemic, and does not require a large initial number of cases.
For this purpose, let us consider instead of (X k ) k 0 the associated process conditioned on non-extinction at each time-step. By this we mean the homogeneous Markov chain (Z k ) k 0 with deterministic initial value Z 0 = x 0 and with the following transition probabilities Denoting by P (i, j) the transition probabilities of the process (X k ) k 0 , we thus have By definition of (X k ) k 0 , P (i, j) = 0 as soon as (j 2 , . . . , j d ) = (i 1 , . . . , i d−1 ), hence the process (Z k ) k 0 satisfies for all i = 2 . . . d and k 0, We also define the one-dimensional d-Markovian process (Z k ) k 0 corresponding to the first coordinate: for all n 0, Z k := Z k,1 . For all d-dimensional vector u, we define the truncated sum ⌈u⌉ := u 1 + · · · + u d−1 . By definition, We consider the CLSE corresponding to the normalized process Z k / a · Z k−1 , where Θ := ]θ min , θ max [ with θ max > θ min > 0, and This estimator has no closed expression, but the continuity of S n on Θ ensures the existence of a least-squares estimate, which can be numerically approximated by minimization of S n on a suitable lattice embedded into Θ as described in Remark 3.11.
Denoting by f ′ the derivative of f with respect to θ, we thus have, for all θ ∈ Θ and all j ∈ N d , j = 0, (3.14) We finally define ε k : and the conditional variance of the error term ε k in the stochastic regression equation is consequently bounded. Indeed, defining for any vector u, u := min i u i and u := max i u i , we have Moreover, if the process (X k ) k 0 is subcritical, then the estimatorθ is asymptotically normally distributed: The proof of this theorem can be found in Appendix C.
Remark 3.9. For many observations, the estimations of θ 0 provided by θ andθ are by construction equal. More precisely, for any N d -valued sequence (x 0 , . . . , x n ) such that for all k n, x k,i = x k−1,i−1 and such that x k / ∈ S, where S is the subset of N d defined by  Remark 3.11. Since the estimatorθ defined by (3.12) has no closed expression, we provide a simple algorithm for its calculation, whose pseudo-code is written below. The algorithm consists in computing the value of S n (θ) for each θ on a finite lattice embedded into Θ, and in returning one value of the lattice for which S n reaches its minimum. The input arguments of the algorithm are the vectors a, b (denoted a and b in the pseudo-code), the vector Z = (Z −d+1 , . . . , Z n ), the bounds θ min , θ max of the open set Θ, and the step length of the lattice chosen by the user.
function S(a, b, Z, θ) ⊲ definition of the function Sn(θ) The accuracy of the result provided by this algorithm then obviously depends on the choice of the step. We can see on examples (e.g. Table 1) that this is entirely satisfying. Indeed, for any observation (x −d+1 , . . . , x n ) which does not con-tain d − 1 successive null values, hence for whichθ(x 0 , . . . , x n ) = θ(x 0 , . . . , x n ), the previous algorithm provides the rounded value (according to the step length of the lattice) of the estimation given by the closed-form estimator θ.

Comparison and choice of the estimators
The goal of this section is to investigate and compare the performances of the three estimators θ, θ andθ discussed in this paper. Due to the many unknown variables and constants involved in the asymptotic results of Theorems 3.2, 3.5, 3.6 and 3.8, and because of the numerous cases and different asymptotic limits which are studied, a unified theorem about the asymptotic relative efficiencies, which could be applicable in practice, does not seem conceivable. We consequently choose to compare numerically the efficiencies in Section 4.1. We then deduce in Section 4.2 which estimator should be chosen in practice for a given scenario, which we illustrate on a simulated multi-phase epidemic.

Comparison of the estimators
In order to compare the performance of the estimators presented above, we perform a large number of simulations and examine the efficiencies of the estimators for finite sample sizes. More precisely, for a given infection parameter θ 0 and sample size (|x 0 |, n), we define the average squared error of an estimator T of θ 0 as follows: 1 1000 (4.1) and the approximated relative efficiency of two estimators T 1 and T 2 : where for each i = 1 . . . 1000, x (i) is a simulation of length n of the branching process (2.1) with infection parameter θ 0 , such that |x (i) 0 | = |x 0 | and x (i) n = 0. For the sake of simplicity we omit in the notation of e θ0 (T 1 , T 2 ) the dependence in (|x 0 |, n). The large number of replications ensures that the average squared error (4.1) is an accurate approximation of the mean squared error In what follows, we say that for a given θ 0 and a given sample size (|x 0 |, n), an estimator T 1 is more efficient than T 2 if e θ0 (T 1 , T 2 ) > 1.
All simulations are performed using the branching process (2.1) with parameters d = 2, a = (0.01, 0.08) and b = (0.05, 0.05). The criticality threshold of such a process is then θ 0 = 10 (see (3.2)). Note that in this simple 2-dimensional case, we are able to determine the value of λ 2 , which plays a major role in the asymptotic limit distribution of θ (see Theorem 3.2). We compute here that λ 2 < ρ 0 , except for very large values of θ 0 (θ 0 > 23) which will not be taken account. As a consequence we are in this example only considering the case (3.8) of Theorem 3.2, for which the convergence rates in n of θ − θ 0 and θ − θ 0 to the normal distribution are equal up to a multiplicative constant. If λ 2 ρ 0 , the convergence rate of θ − θ 0 to its limiting distribution is smaller than for θ − θ 0 , hence we do not investigate this case.
We first plot in Figure 1 the average squared errors of the estimators against the initial population size |x 0 |, for θ 0 ∈ {9, 10, 11} and for a number of observations fixed at n = 10. The average errors are computed over the same 1000 simulations for each of the three estimators. Keeping in mind Remark 3.9, this explains why the plots for θ andθ are indistinguishable. We observe that the average squared error for θ is also extremely close and decreases as rapidly as for θ. We then plot in Figure 2 the average squared errors against the number of observations n, for θ 0 ∈ {9, 11} and for an initial population size fixed at |x 0 | = 100. For θ 0 = 10 and θ 0 = 11 (critical and supercritical cases), the average squared errors for the three estimators are again indistinguishable at this scale. For θ 0 = 9, we observe that the average errors are increasing as n becomes very large. This is due to the fact that, if n is large compared to |x 0 | = 100, the probability that X n = 0 is low in the subcritical case. Hence the 1000 simulations correspond to rare events and are not typical. As we will see in Figure 4, the estimatorθ is in this case the most appropriate, intuitively because it takes into account the additional information that the simulated data have still not become extinct at time n.
Finally, in order to differentiate more precisely the performances of the estimators, we focus on estimators which have asymptotic properties for the same class of criticality and study their approximated relative efficiency defined by (4.2). We thus compare θ against θ by plotting e θ0 ( θ, θ) in Figure 3, andθ against θ by plotting e θ0 (θ, θ) in Figure 4, for θ 0 ∈ [8,12]. We consider two sample sizes for |x 0 | (|x 0 | = 10 and |x 0 | = 1000) and for n (n = 10 and n = 100). What appears from these results is that θ is here the most efficient estimator when |x 0 | = 10 and n = 10, no matter the class of criticality. It appears also that in the supercritical case, θ and θ perform equivalently well and better than θ, no matter the sample sizes. Finally,θ is here the most efficient estimator in the subcritical case, for |x 0 | = 10 and n = 100.

Choice of the estimator in practice and illustration with a simulated multi-phase epidemic
Let us deduce from the above simulated results and from the theoretical study of Section 3 how to choose the appropriate estimator, given epidemic data, and illustrate our method on an example plotted in Figure 5. Our example mimics the evolution of an epidemic during which three control measures are taken, at time 50, 60 and 70. It is obtained by simulation of the branching process (2.1) with parameters d = 2, a = (0.01, 0.08), b = (0.05, 0.05) and initial value x 0 = (1, 1). First we choose a supercritical infection parameter θ 0 = 13 and simulate over 50 time steps, and then simulate three phases with respective infection parameter θ 0 = 11, θ 0 = 9.5, θ 0 = 7, over, respectively, 10,

Table 1
Estimations of θ 0 and 95% confidence intervals based on the data set (x 0 , , . . . , x 120 ) plotted in Figure 5, for different phases corresponding to different data subsets. 10 and 50 time steps. We thus obtain the data (x 0 , . . . , x 120 ). The estimations provided by the three estimators on distinct phases are listed in Table 1, as well as the 95% confidence intervals stemming from Theorem 3.6 and Theorem 3.8.
On the first line of Table 1, we report the estimations based on the whole data (x 0 , . . . , x 120 ), which thus entails the different phases and does not correspond to a process with constant infection parameter.
Since each of the three estimators discussed in this paper is relevant for a branching process of the form (2.1) with a constant offspring distribution over time, it is necessary to work with data for which the infection parameter is known or assumed to remain unchanged. If the latter has been potentially modified during the course of the epidemic, for instance because of control or sanitary measures, then the corresponding phases of the epidemic must be isolated (as illustrated with our example in Figure 5). For each phase, the unknown infection parameter might then be estimated with the most suitable estimator.
Due to its numerous properties (Theorem 3.5 and Theorem 3.6), θ appears to be suitable in most cases, namely for any phase with large initial population size (such as the three last phases in Figure 5) or for a long growth phase (such as the first phase in Figure 5). Thanks to Theorem 3.6, θ provides in addition an explicit confidence interval, for |x 0 | large enough, which is of a great practical interest. By contrast, the rate of convergence of θ to its limiting distribution is in practice unknown (Theorem 3.2). For these different reasons, we prefer θ over θ, even though θ might in some cases perform slightly better than θ (see Figure 3).
Similarly, we prefer in most cases θ overθ. We recall that the estimations provided by these two estimators only differ from each other if the data set contains some sequences of d−1 consecutive null values. However, for long decay phases with small initial population size (such as (x 90 , . . . , x 120 ) in Figure 5, for which |x 90 | = 176), thenθ appears to be more relevant (see Figure 4 for n = 100 and |x 0 | = 10 in the subcritical case) and would be chosen over θ.
Appendix A: Proofs related to Section 3.1 Proof of Theorem 3.2. We deduce Theorem 3.2 from Theorems 6.1-6.3 of [1], which state that if the process (X k ) k 0 is supercritical, then, on the set of non-extinction, the estimator ρ is strongly consistent: lim n→∞ ρ a.s.
and has the following asymptotic distribution: if λ 2 < ρ 0 , if λ 2 > ρ 0 , there exist random variables H n with lim |H n | < ∞, such that The strong consistency in Theorem 3.2 is immediate from (3.5) and (A.1). We then express θ − θ 0 as a function of ρ − ρ 0 , in order to deduce its asymptotic distribution from (A.2)-(A.4). We write and use the fact that, for all k = 1 . . . d, in order to obtain Appendix B: Proofs related to Section 3.2 Proof of Theorem 3.5. We deduce from the basic limit result in the supercritical case that for any vector u ∈ R d , lim n→∞ ρ −n 0 X n a.s.
Note that we have the identity We deduce from the martingale central limit theorem given by Theorem 2.2 in [1] (applied to the vectors denoted in [1] as a n = e 1 and to the real constants γ n = 1) that which combined with (B.1) leads to the desired asymptotic normality.
The proof of Theorem 3.6 can be found in [11].
Appendix C: Proofs related to Section 3.3 Proof of the strong consistency ofθ in Theorem 3.8. According to Proposition 3.1 in [9], sufficient conditions for the strong consistency ofθ are that f (., Z k−1 ) is Lipschitz, in the sense that there exists a nonnegative σ(Z 0 , . . . , Z k−1 )-measurable function C k satisfying for all θ 1 , θ 2 ∈ Θ, < ∞, and that for any δ > 0, The Lipschitz condition is satisfied thanks to (3.14), which shows that f ′ (., Z k−1 ) is bounded on Θ. The second condition follows from (3.15). Let δ > 0 and θ ∈ Θ such that |θ − θ 0 | δ. We assume for convenience that θ 0 > θ. In order to prove (C.1), we apply the mean value theorem to the function f (., Z k−1 ), and obtain that there exists some which implies (C.1).
In order to provide the asymptotic distribution ofθ − θ 0 in Theorem 3.8, we shall state the positive recurrence of the Markov chain (Z k ) k 0 , and the finiteness of the first and second-order moments of its stationary distribution.
Proposition C.1. Let us assume that the process (X k ) k 0 is subcritical. Then the homogeneous Markov chain (Z k ) k 0 is irreducible positive recurrent, and its stationary distribution ν θ0 satisfies for all i, j = 1 . . . d, Proof of Proposition C.1. Clearly, the chain is irreducible: due to the Poisson random variables coming in play, any nonzero state is attainable from any other nonzero state in a finite time. The positive recurrence then follows from a criterion given e.g. in [14], Theorem 3.1: positive recurrence is implied if there exists a finite set A ⊂ N d \{0} and a non-negative function g on N d \{0} such that where ξ is the eigenvector defined in Section 2. Then, for all i ∈ N d \{0}, .
Since ρ 0 < 1, the right term tends to +∞ as |i| → ∞. Consequently there exists some finite set A satisfying (C.3), and the chain is positive recurrent.
Let us now prove that its stationary measure ν θ0 admits finite first-order moments. First, we point out that by definition of Z n , we have for all i = 1 . . . d, It is thus enough to demonstrate (C.2) for i = 1. According to [15], Theorem 1, in order that i∈N d \{0} i 1 ν θ0 (i) < ∞, it suffices that for some non-empty finite set B and some function h with h(i) we have for all i = 0, h(i) i 1 , and which ensures the existence of some finite set B satisfying (C.5). Consequently, j∈N d \{0} j 1 ν θ0 (j) < ∞, and the quantity m ν θ 0 defined in (C.4) is finite. Due to the specific properties of the process (Z k ) k 0 , it is possible to deduce from this that the second-order moments of the stationary measure ν θ0 are finite as well. Indeed, for all i = 1 . . . d − 1, using the inequality Similarly, lim E θ0 (Z n Z n−k ) 4 + 4m ν θ 0 We then obtain by means of Fatou's lemma that for every i, j = 1 . . . d, The asymptotic normality ofθ − θ 0 in Theorem 3.8 is a consequence of the following result.
Proposition C.2. Let us assume that the process (X k ) k 0 is subcritical. Then the estimatorθ is asymptotically normally distributed: where f is given by (3.13).