Robust estimation in controlled branching processes: Bayesian estimators via disparities

This paper is concerned with Bayesian inferential methods for data from controlled branching processes that account for model robustness through the use of disparities. Under regularity conditions, we establish that estimators built on disparity-based posterior, such as expectation and maximum a posteriori estimates, are consistent and efficient under the posited model. Additionally, we show that the estimates are robust to model misspecification and presence of aberrant outliers. To this end, we develop several fundamental ideas relating minimum disparity estimators to Bayesian estimators built on the disparity-based posterior, for dependent tree-structured data. We illustrate the methodology through a simulated example and apply our methods to a real data set from cell kinetics.


Introduction
Branching processes are routinely used to model population evolution in a variety of scientific disciplines such as cell biology, population demography, biochemical processes, genetics, epidemiology, and actuarial sciences (see, for instance Devroye (1998), Haccou et al. (2005), González et al. (2010), Kimmel and Axelrod (2015) and del ). While several variants of branching processes are available, a particularly useful variant is the controlled branching process (CBP) with random control functions, on which this work is focused.
These processes are discrete time and discrete state stochastic processes -like classical Galton-Watson processes (GWPs) -describing the growth of a population over the generations using probability distributions for the reproduction. However, unlike the GWPs, the number of progenitors is determined by a random mechanism which is referred to as control functions. This fact allows for modelling real-life phenomenon such as random migratory movements (for instance, immigration and emigration), with great flexibility. Moreover, although the GWP and the CBP have a Markovian structure and individuals reproduce independently of the others in both models, in a CBP the evolution of the tree generated by a progenitor is not independent of the trees generated by the others. This lack of independence leads to substantial technical challenges in the theoretical developments and is described in detail in the manuscript. A good reference for the probabilistic theory and inferential issues developed until now for the CBPs is the recent monograph González et al. (2018). The great flexibility offered by CBPs comes with a cost; they require specification of multiple distributions such as the offspring distribution and the control distributions. Divergence-based methods have been used to provide methodologies for inference in these settings that are robust to presence of outliers and efficient when the posited model is correct, see Sriram and Vidyashankar (2000) for the GWP and the more general approach given in González et al. (2017) for the CBP. Analysing robust procedures in depth in the field of branching processes is of great interest since it is not unusual to find the existence of a small proportion of individuals in a population whose reproductive capacity is influenced by temporary events, leading to outliers in the data. While the aforementioned papers deal with the problem from the frequentist standpoint, divergence-based methods in the Bayesian framework in the context of branching processes with strong theoretical guarantees do not exist. This paper is focused on addressing this problem.
Bayesian inference in the presence of model misspecification has received much attention in recent years. In the context of classification problems, Jiang and Tanner (2008) studied the behaviour of the so-called Gibbs posterior under a variety of conditions on the risk functions and hence allowing for potential model misspecification. See also the classical reference Catoni (2004) and Bissiri et al. (2017). On the other hand, Hooker and Vidyshankar (2014) provided an alternative approach for Bayesian inference under model misspecification using divergences for conditionally independent and identically distributed (c.i.i.d.) random variables. Indeed, in their work they evaluate the effect of model misspecifications by studying the asymptotic behaviour of the posterior estimates under the posited model and misspecified model settings. Alternatively, again in the c.i.i.d. setting, Ghosh and Basu (2016) and Ghosh and Basu (2017) used power divergences to derive robust Bayesian methods. More recently, Miller and Dunson (2018) developed an alternative coarsening approach, for c.i.i.d. random variables, which essentially amounts to assuming that the observed data are within an -neighbourhood of posited model for a suitably defined neighbourhood and .
Returning to the setting of CBPs, in order to make inference of the offspring distribution, we assume that it belongs to some known general parametric family. This is a reasonable assumption in the context of branching processes, since we often have some knowledge on the reproduction process of the population. For instance, the geometric distribution is proposed as a reproduction law for a GWP applied to model data from a yeast cell colony in Guttorp (1991), or is used to establish analogies between branching processes and the thermodynamic properties of small systems in Corral et al. (2016). Additionally, a GWP with a generalized power series offspring distribution is considered as a model for the spread of an infectious disease in a population in Angelov and Slavtchova-Bojkova (2012). The classical reference of Holgate and Lakhani (1967) provides a broad description of parametric families of offspring distributions suitable in an ecological context. For example, a Poisson distribution is the natural offspring distribution to consider in a population where each progenitor produces a large number of eggs whose probability of becoming an individual is small and it is constant for all the eggs; a Bernoulli distribution with support {0, 2} is appropriate to model the reproduction of organisms by binary fission, such as bacteria, amoeba, etc. Additional examples involving a compound Poisson distribution, a negative binomial distribution or a zero-modified geometric distribution can be found in Holgate and Lakhani (1967).
The current paper describes the use of disparities for CBP data to provide robust estimation of the parameters of the offspring distribution. We assume that the number of parents giving birth to exactly k offspring is known for all k ≥ 0. For sake of brevity, henceforth this sample will be referred as the entire family tree. Roughly speaking, our proposed method, analogous to the c.i.i.d. case, consists in replacing the log-likelihood in the expression for the posterior distribution with an appropriately scaled disparity measure, and studying the expectation and the mode of the resulting disparity based posteriors to obtain the Bayes estimators, known as EDAP and MDAP estimators, respectively (see Section 3 for definition). However, unlike the c.i.i.d. case, two new issues arise: the scaling for the disparity is random and its asymptotic behavior depends on the parameters of interest. To address these two issues, it is common to use conditioning arguments; however, as explained previously, the lack of independence between the trees generated by the progenitor and other members of the population (due to the introduction of control functions) causes sufficient challenges and requires development of new mathematical techniques. Indeed, to establish the asymptotic properties, we need results that invoke the branching structure -see Lemma 5 in Supplementary Material (González et al., 2020) for details -and results that establish L 1 convergence of estimated posterior densities to the Gaussian density (see Theorem 4.2). It is worth mentioning that such results are not known even for likelihood based posteriors and to the best of our knowledge, the results in this paper are the first ones for dependent tree-structured data. As a special case, when the control function yields the population size of a generation, one obtains results for classical GWPs.
Moreover, turning to the assumptions, it is also important to mention that our class of probability distributions, Γ * , defined below in Section 3, refines and improves the condition (A5) in Hooker and Vidyshankar (2014). This is related to the identifiability of the model.
Beyond the challenges outlined above, we re-emphasize that even though the asymptotic properties of Bayes estimators are similar to those of the frequentist estimators, the proofs concerning Bayes estimators are highly non-trivial. Indeed, one of the biggest challenges is to verify that the posterior mean is close to the minimum disparity estimator at a specified rate (see Theorem 4.3 below). Furthermore working with the posterior mode introduces additional difficulties and requires uniform convergence of posterior densities, which we establish in this paper. Indeed, under further conditions we show that this yields the closeness of the posterior mode and the disparity estimator at the required rate (see Theorems 4.4 and 4.5 below).
Our algorithm to obtain the EDAP and MDAP estimators is quite different from those developed in Hooker and Vidyshankar (2014). Unlike their work, which relies on Metropolis-Hastings algorithm, we use numerical techniques such as Newton-Cotes methods and non-linear optimization methods to obtain the estimators. This approach decreases the computational burden significantly especially when replacing the likelihood by alternative disparities. Furthermore, we establish that the proposed methods yield estimators that are similar in numerical value to the traditional Bayesian estimators in the absence of data contaminations, and are more robust in the presence of outliers. More details are provided in Section 6.
Besides this Introduction, the rest of the paper is structured as follows: Section 2 describes the CBP model and states the hypotheses considered throughout the manuscript. Section 3 is concerned with the description of Bayes estimators under disparity measures and provides sharp probabilistic bounds for differences between the EDAP and MDAP estimators and the frequentist minimum disparity estimators. Section 4 is devoted to the asymptotic properties of the estimators while Section 5 deals with robustness properties. Section 6 contains two examples where we use the R programming environment. Our first example is concerned with a real data set of a oligodendrocyte cell population, firstly used in Yakovlev et al. (2008). This example illustrates the versatility of the proposed methodology since such data are typically modeled by a multi-type branching process. The second example is a simulation study to illustrate the robustness of the proposed methodology. Finally, the contributions of the manuscript are summarized in Section 7. The Appendix gathering some results on the existence and continuity of functionals related to EDAP and MDAP estimators, additional figures, simulated data, the proofs of the main results and the computational codes developed for the examples are presented in the Supplementary Material.

The probability model
A controlled branching process (CBP) with random control functions is a Markov chain {Z n } n∈N0 defined recursively as follows: {X nj : n ∈ N 0 ; j ∈ N} and {φ n (k) : n, k ∈ N 0 } are two independent families of non-negative integer valued random variables defined on a probability space (Ω, A, P ). The empty sum in (1) is taken to be 0. The random variables X nj , n ∈ N 0 , j ∈ N, are assumed to be i.i.d. and {φ n (k)} k∈N0 , n ∈ N 0 , are independent stochastic processes with the same one-dimensional probability distributions. Intuitively, this process models an evolving population in which each individual reproduces, independently of each other and of the previous generation population, according to the same probability distribution. However, unlike the classical GWP, the number of reproducing individuals in the n-th generation, φ n (Z n ), is a random function φ n (·) of the generation size Z n rather than Z n itself. In the case when φ n (x) ≡ x, one obtains the classical GWP. As in the GWP, the quantity X nj can be interpreted as the number of offspring produced by the j-th progenitor in the n-th generation. The terminology, number of progenitors of the n-th generation will sometimes be used to describe φ n (Z n ). We denote the offspring distribution by p = {p k } k∈N0 , p k = P [X 01 = k], k ∈ N 0 . Moreover, in relation to the moments of the process, we denote by m = E[X 01 ] and σ 2 = V ar[X 01 ], the offspring mean and variance (assumed to be finite), while by ε(k) = E[φ 0 (k)] and σ 2 (k) = V ar[φ 0 (k)], k ∈ N 0 , the mean and variance function of the random control functions.
As explained in the introduction, we focus on offspring distributions that are parametric; that is, we assume that p k (θ) = p k = P [X 01 = k], for k ∈ N 0 , θ ∈ Θ and Θ ⊆ R with a non-empty interior. We notice here that extension to multidimensional case is possible at the cost of more cumbersome notations. We denote the parametric family by F Θ = {p θ : θ ∈ Θ}, where p θ = {p k (θ) : k ∈ N 0 } is the offspring distribution for each θ ∈ Θ, and these satisfy regularity conditions such as differentiability in θ. We also assume that the parametric model satisfies the following identifiability condition: The main results that we describe in the paper will require the generation sizes to diverge to infinity with positive probability. While the assumption of supercriticality of the offspring distribution is sufficient in the GWP case, one needs a slight modification of such a condition (see Chapter 3 in González et al. (2018)). Assumption 1 in Section 4 fixes the framework we need in relation to this issue.
The following additional notation regarding the sample is needed for the theoretical developments: where I A represents the indicator function of the set A. Notice that Z l (k) represents the number of individuals in generation l who have exactly k offspring. Also, let We notice that Y l (k) represents the total number of progenitors who have exactly k offspring up to generation l. Furthermore, Y l and Δ l represent the total number of individuals and the total number of progenitors until the l-th generation.

Bayesian estimators using disparity measures
In this section, we describe Bayesian estimators using disparities. To this end, let L 1 denote the space of measurable functions which are Lebesgue integrable and equipped with the L 1 -norm. Also, let π(·) denote a prior density on Θ which belongs to L 1 , i.e., that the posterior density of θ, given the sample Z * n is, where KL(q, θ) is the Kullback-Leibler divergence between a probability distribution q defined on N 0 and p θ ; that is, andp n = {p k,n } k∈N0 is the non-parametric maximum likelihood estimator (MLE) of the offspring distribution, p, based on the sample Z * n (see González et al. (2016)); that is,p Two traditional summary Bayes estimators for the parameter θ are (i) the posterior mean or expectation a posteriori (EAP) and (ii) the posterior mode or maximum a posteriori (MAP). These are defined as follows: respectively. However, it is widely known, that these estimators fail to yield robust estimates in the i.i.d. case. We illustrate this fact for the offspring distribution of the CBP in the following example.
Example 1. Consider a CBP starting with Z 0 = 1 individual and for each k ∈ N 0 the control variable φ n (k) follows Poisson distribution with parameter λk, with λ = 0.3. In practice, these control functions are suitable to describe an environment with an expected emigration. For the offspring distribution, we consider a mixture model for gross errors (see (12) in Section 5 for definition); specifically, each individual at each generation produces offspring according to a geometric distribution with parameter θ 0 = 0.3 with probability 0.85, and with probability 0.15 it gives birth to L = 11 offspring. The offspring mean and variance, ignoring the presence of the contamination at the point L = 11, are m = m(θ 0 ) = (1 − θ 0 )/θ 0 = 2.333 and σ 2 = σ 2 (θ 0 ) = (1 − θ 0 )/θ 2 0 = 7.778, respectively. Using the statistical software R, we have simulated the first 45 generations of such a model, z * 45 . The evolution of the number of individuals and progenitors is shown in Figure 1 (left), where growth in both groups is observed. Next, in order to estimate the posterior density function of θ using the sample z * 45 , we used a beta prior distribution with parameters 1/2 and 1/2. This choice was motivated by the fact that, according to Berger and Bernardo (1992), this is a non-informative prior distribution for a parameter between 0 and 1. We do not discuss further about the choice of the prior distribution, but a sensitivity analysis is provided in the Supplementary Material to show the negligible impact of the prior distribution. The estimate of the posterior density function of θ is plotted in Figure 1 (centre), where one notices the poor estimate for the true offspring parameter, θ 0 . For a better illustration, in Figure 1 (right) we have also plotted the evolution of the EAP and MAP estimates for θ 0 through the generations. We remark that, similar to the case for the MLE in a frequentist setting, EAP and MAP estimators also possess unsatisfactory behavior in the presence of contamination. However, these estimators possess desirable asymptotic properties -consistency and asymptotic normality -in a contamination-free context.  Situations similar to the above example clearly showcase the need of robust estimators against outlier contamination. Since outlier contamination can frequently be described using a mixture model, the robustness of the Bayes estimators to outliers can be cast in the general framework of model misspecification. The general disparity-based approach, as described in Hooker and Vidyshankar (2014) and adapted to the family tree data Z * n generated by the CBP, facilitates such an analysis. As in Hooker and Vidyshankar (2014), the equation (3) suggests defining a density function by replacing the Kullback-Leibler divergence with a suitable disparity measure D defined on Γ × Θ, where Γ is the set of all probability distributions defined on the non-negative integers. A disparity measure is defined (see Lindsay (1994)) using a strictly convex and thrice differentiable function G : [−1, ∞) → R satisfying G(0) = 0 as follows: where q = {q k } k∈N0 , and δ(q, θ, k) denotes the "Pearson residual at k", that is, The resulting function after replacing the Kullback-Leibler divergence with a disparity-based quantity is referred to as D-posterior density function atp n and is defined as: More generally, the previous definition can be extended for an arbitrary probability distribution q ∈ Γ, yielding the D-posterior density function at q that is given by Observe that (6) is well defined for each q ∈ Γ if and only if supp(π)∩{θ ∈ Θ : D(q, θ) < ∞} has non-null Lebesgue measure, where supp(π) denotes the support of the prior density function. In particular, if the disparity measure D is bounded on Γ × Θ, then supp(π) ∩ {θ ∈ Θ : D(q, θ) < ∞} = supp(π), for each q ∈ Γ, and the density function in (6) is well defined. Taking into consideration this fact, we focus our attention on the following set: Analogous to the classical Bayes estimators, one can summarize the information from the D-posterior using the estimators expectation a D-posteriori (EDAP ) and maximum a D-posteriori (MDAP ); these are defined as follows: • Expectation a D-posteriori (EDAP ): • Maximum a D-posteriori (MDAP): Specifically, the EDAP estimator mimics the Bayes point estimators for the posterior density under the squared error loss function, whereas the MDAP estimator does it under 0 − 1 loss function. We next focus on a few useful examples of non-negative disparity measures and the corresponding D-posterior density functions and refer to Cressie and Read (1984) and Lindsay (1994) for additional examples.
Example 2. (a) The Kullback-Leibler divergence with the parametric family F Θ , defined above, is provided by the function G(δ) = (δ +1) log(δ +1)− δ. Consequently, the D-posterior distribution and the EDAP and MDAP estimators using this disparity coincide with the posterior distribution and the EAP and MAP estimators, respectively.
The D-posterior distribution using this disparity is referred to as HD-posterior distribution and the EDAP and MDAP estimators are denoted by θ * HD n and θ +HD n , respectively, for each n ∈ N.
The D-posterior distribution using this disparity is referred to as NED-posterior distribution and the EDAP and MDAP estimators are denoted by θ * NED n and θ +NED n , respectively, for each n ∈ N.
For the study of asymptotic and robustness properties of the EDAP and MDAP estimators, which we address in Sections 4 and 5, we introduce the EDAP functions, defined for each n ∈ N as follows: , and the MDAP functions, defined as: whenever this minimum exists. With these definitions, θ * D n (ω) = T n (p n )(ω), and θ +D n (ω) = T n (p n )(ω). Note that these functions depend on the total number of progenitors, and one has different EDAP and MDAP functions for each disparity measure and for each prior distribution; however, in order to ease the notation we will not make explicit this relation.
Despite the similarities between EDAP and MDAP functions defined in the context of branching processes and those in the case of i.i.d. random variables, there is a key difference. While in the i.i.d. case these are deterministic functions, in the current context they are random variables. This randomness arises due to the random sample size Δ n−1 and leads to questions concerning the existence and measurability of these functionals that must be treated carefully (see, for instance Brown and Purves (1973)). However, in order to ease the reading of the paper we omit those details here and we relegate the study of those questions to the Appendix in the Supplementary Material. Taking into account these considerations, we assume the conditions that guarantee the existence and uniqueness of T n (q), q ∈ Γ, throughout the paper.

Consistency and asymptotic normality
In this section, we focus our attention on the asymptotic behaviour of the EDAP and MDAP estimators. The proofs of the results of this section are presented in Section 2 of the Supplementary Material. One of our main results in this section is that EDAP and MDAP estimators are asymptotically efficient when the posited offspring distribution belongs to the parametric family. This will be obtained as a consequence of a more general result; viz., the L 1 almost sure convergence of posterior density to a Gaussian density with mean 0 and variance equal to the inverse of the Fisher information.
As stated in Section 2, we impose the following assumption on the asymptotic behaviour of the process: Assumption 1 (On the supercriticality of CBPs). The CBP satisfies: In González et al. (2016), it is established that under the Assumption 1, Δ n → ∞ a.s., andp k,n is a strongly consistent estimator of p k , for each k ∈ N 0 , on the set {Z n → ∞}. Recall that if the offspring distribution generating the CBP is parametric, then there exists an interior point θ 0 ∈ Θ such that p = p θ0 .
To establish the results, we need additional regularity conditions and hence, in the remainder of this paper we assume that for each q ∈ Γ, the first and the second derivative of D(q, θ) with respect to θ exist and we denote them byḊ(q, θ) andD(q, θ); the reader is referred to Section 4 in González et al. (2017) for conditions that guarantee their existence. We also denote by I D (θ) =D(p, θ), and I D n (θ) =D(p n , θ), where p is the posited offspring distribution andp n was defined in (4). Henceforth, we also assumewithout loss of generality (see Lindsay (1994), p. 1089) -that G (0) = 0 and G (0) = 1. Thus, taking into account the previous hypotheses and the identifiability condition (2), if p = p θ0 , one has that I D (θ 0 ) reduces to the Fisher information at θ 0 denoted by I(θ 0 ) (see González et al. (2017)).
In the proofs of the results in this section the asymptotic properties of the frequentist minimum disparity estimator will play a critical role. As it was introduced in González et al. (2017), the minimum disparity estimator (MDE) of θ 0 based onp n is defined as: and the associated disparity function defined as: whenever this minimum exists. For details about conditions for the existence, uniqueness, and continuity of the function T , we refer the reader to Theorems 3.1-3.3 in González et al. (2017). Note that analogous to EDAP and MDAP functions, the disparity function depends on the disparity measure which we suppress in our notations. With this notation, T (p n ) =θ D n . Furthermore, we strengthen our regularity conditions on the disparity measure and the prior distribution as below: Assumption 2 (On the disparity function and the prior distribution).
Assumption 3 (On the continuity of the disparity measure and the MDE). The following properties concerning the disparity measure and the MDE hold: where d − → represents the convergence in distribution with respect to the probability P [·|Z n → ∞].

(b) Sufficient conditions for (b) in Assumption 3 can be found in Section 4 in
(c) Conditions that guarantee the convergence of (9) and (10) were established in Theorems 3.4 and 4.1, respectively, in González et al. (2017).
We now introduce the following subset of probability distributions which we need in the statement of our next theorem. Let In the following ϕ(t; θ) denotes the density function of a normal distribution with mean 0 and variance I D (θ) −1 . Additionally, let ϕ n (t) denote the density function of a normal distribution with mean 0 and variance I D n (θ D n ) −1 . The following results hold true for the offspring distribution p without assuming that it belongs to the parametric family. In the case when p = p θ0 and under the identifiability condition (2), then the same results hold with θ p = θ 0 and I D (θ p ) = I(θ 0 ). We state our first main result of this section.
The following theorem shows that the EDAP estimator mimics the MDE at the rate Δ 1/2 n−1 . This feature leads to the asymptotic normality of the centered and scaled EDAP estimator.
Theorem 4.3. Under the hypotheses of Theorem 4.2, the following convergences hold: To establish the asymptotic properties of the MDAP estimator, we need the uniform convergence of posterior density to the Gaussian density. This is the content of our next theorem.

Robustness properties
In this section we describe the robustness properties of Bayesian disparity estimators EDAP and MDAP. The proofs of the results of this section are gathered in Section 3 of Supplementary Material. In the context of i.i.d. data some of these issues were investigated in Hooker and Vidyshankar (2014). The study of disparities in the frequentist context for branching processes has been investigated for the Hellinger distance in Sriram and Vidyashankar (2000) and for a general standpoint in González et al. (2017). It is pertinent to note here that the standard robustness concepts such as influence function, breakdown point (both finite and asymptotic) and α-influence curves, α ∈ (0, 1), take a familiar form as the i.i.d. case and hence we will keep the discussion rather brief.
We begin with a brief description of the framework. As is common and widely accepted in the study of robustness problems, we focus on the gross error contamination model. This model is a particular case of the semiparametric gross error introduced in Hampel (1974), and is given by where θ ∈ Θ, α ∈ (0, 1), L ∈ N 0 , and η L is a point mass distribution at L. A key ingredient used in the investigation of robustness is the α-influence function, where 0 < α < 1. The α-influence function of a random variable T : Γ × Ω → Θ, is a mapping such that In our first result in this section we consider the disparity function T as a degenerate random variable, i.e., T (q)(ω) = T (q), for each ω ∈ Ω and each q ∈ Γ, and establish that the α-influence function of the EDAP (respectively, MDAP) function and of the disparity function are close and characterize the difference in terms of the random sample size Δ n−1 . To this end, we need the following additional assumption.
( b) D is a disparity measure associated with a function G(·) which satisfies that G (·) and G (·) are bounded on [−1, ∞).
The following result is an immediate consequence of Theorem 3 in the Appendix in the Supplementary Material.
(ii) If Θ is complete and separable and ( b) in Assumption 2 holds, then, An immediate consequence of this proposition is that for each α ∈ (0, 1), and n ∈ N large enough, the α-influence curve of the disparity functional at p provides a good approximation of the α-influence curves of T n and of T n at this probability distribution. In addition, under conditions given in Theorem 5.1 in González et al. (2017), one has that lim L→∞ lim n→∞ IF α (L, T n , p) = 0 a.s. and lim L→∞ lim n→∞ IF α (L, T n , p) = 0 a.s.
The study of the α-influence curves as a measure for the robustness of an estimator was driven by the fact that the influence functions do not always provide a good description of the robustness properties of an estimator. Nonetheless, we establish a sufficient condition for the influence functions of the EDAP estimators to be bounded, and consequently, from the classical viewpoint, the EDAP estimators are robust. Recall the definition of the influence function for EDAP estimators at p is given by Theorem 5.2. If ( b) in Assumption 4 holds, then |IF (L, T n , p)| < ∞ a.s., for each L ∈ N 0 and n ∈ N.
Next, we turn our attention to the study of breakdown point. While the notion of influence curves represents the effect of a single outlier in the asymptotic behaviour of the estimator, the breakdown point intuitively represents the percentage of contamination that the estimator can asymptotically bear without taking arbitrarily large values. Classically, the breakdown point of a general function T at q ∈ Γ is defined as: That sequence is called sequence of contaminating probability distributions. The notion of breakdown point can be extended in a natural way to a function T : Γ × Ω → Θ, such that for each q ∈ Γ, T (q) is a random variable. We will study the breakdown point of the EDAP and MDAP functions when the contaminating probability distributions and the parametric family satisfy the following assumptions: Assumption 5 (On the parametric family and the sequence of contaminating probability distributions).
We notice here that the above conditions represent the worst possible contamination context (see, for instance, Park and Basu (2004), p. 28). The next result, which in content is analogous to the i.i.d. case, describes the role of the prior in the robustness of the EDAP and MDAP estimators. It is to be noted that the asymptotic breakdown point of the minimum disparity estimators is at least 50%.

Theorem 5.3. Suppose that ( c) in Assumption 4 holds true.
(i) If D(q, θ) is bounded for any q ∈ Γ and θ ∈ Θ, then the breakdown point of the EDAP at p is 1.

(ii) Assume that ( b) in Assumption 4, and (c) in Assumption 5 hold. Then, for any family of contaminating distributions {q L } L∈N0 satisfying (a) and (b) in Assumption 5, the breakdown point of the MDAP at p is 1. The result also holds for the Hellinger distance.
We next turn our attention to a new notion for the study of robustness for a sequence of estimators which are dependent on the sample size. The asymptotic breakdown point for a sequence of estimators {T n } n∈N at q ∈ Γ is defined as: Before we describe the behaviour of EDAP and MDAP with respect to the above measure, we introduce the following assumption: Assumption 6 (On the sequence of contaminating probability distributions). The sequence of contaminating probability distributions {q L } L∈N0 satisfies that for any α ∈ (0, 1) there exists δ > 0 such that Our next theorem describes the asymptotic breakdown point of EDAP and MDAP estimators. To that end, we need some additional regularity conditions on the parametric family; indeed, assume condition (A4) -alternatively, (A6) for the Hellinger distancein González et al. (2017).

(ii) If Θ is complete and separable and ( b) in Assumption 2 holds, then the asymptotic breakdown point of { T n } n∈N at p is equal to the breakdown point of T at p.
The result also holds for the Hellinger distance.
While the above results are concerned with EDAP and MDAP estimators, in the context of the methods proposed in the manuscript it is important to know the asymptotic behaviour of the corresponding D-posterior density functions at a contaminating model. Our next result addresses this issue. Specifically, given α ∈ (0, 1), and a family of contaminating distributions {q L } L∈N0 , the following result establishes the asymptotic behaviour, as L → ∞, of the D-posterior density function at (1 − α)p + αq L .
The result also holds for the Hellinger distance.
Example 1 revisited: To illustrate the results of Theorem 5.5, in the data analysis example described in Section 6 below, we will revisit Example 1. We consider a slight modification of the model described previously, namely the offspring distribution is a mixture of (1 − α) · 100% of a geometric distribution of parameter 0.3 and α · 100% contamination at L, and the control distributions are Poisson of parameter λk = 0.45k, where α is fixed at 0.01 and L increases. Concretely, in Figure 8

Examples
In this section, we illustrate the methodology proposed via two examples. The first example is based on a real data set in the context of cell kinetics populations. In this example, we show that the EDAP and MDAP estimators based on different disparity measures are as good as the ones based on the Kullback-Leibler divergence in absence of contamination. The second example of this section, which is a continuation of Example 1 introduced in Section 3, demonstrates the accuracy of the method in a contamination context. Thus, unlike the EAP and MAP estimators based on the likelihood disparity, the twofold goodness (in contamination and free-contamination contexts) of EDAP and MDAP estimators is exemplified. In both cases, the results are illustrated by considering the Hellinger distance and the negative exponential disparity. The proposed methodology was implemented in R programming environment. The corresponding codes can be found in the Supplementary Material.

Example -Oligodendrocytes
In this example, we consider data from a study on cell proliferation published by Hyrien et al. (2006) and re-analyzed in Yakovlev et al. (2008). Both the papers were concerned with modeling the proliferation of oligodendrocyte percursor cells and their transformation into terminally differentiated oligodendrocytes through multitype age-dependent branching processes. The cell populations considered consist of two types of cells: the oligodendrocyte precursor cells, referred as type T 1 cells, and the terminally differentiated oligodendrocytes, referred as type T 2 cells. The development of both types of cells is as follows: type T 1 cells can die without any offspring or divide under normal conditions; when stimulating to division, precursor cells are capable of producing either direct progeny (two daughter cells of the same type) or a single terminally differentiated nondividing oligodendrocyte. The data are supplied by time-lapse video recording of oligodendrocyte populations. These experimental techniques enable us to record all observable events (division, differentiation, or death), as well as their timing, in the development of each individual cell. Due to the design of these experimental techniques it is reasonable to assume absence of contamination in the observed sample (the multidaughter cell division can be observed, although rarely, in cancer studies -see Tse et al. (2012) -this is not the case in our observed sample). A special feature is the presence of censoring effects due to migration of precursor cells out of the microscopic field of observation, modelled as a process of emigration of the type T 1 cells. Yakovlev et al. (2008) proposed frequentist estimators for the offspring distribution of the continuous time evolution by embedding it in a discrete-time multitype branching process. They incorporated cell-emigration in an artificial way by modifying the reproduction law. As a more natural alternative, we propose a two-type controlled branching process to describe the embedded discrete branching structure of the continuous time branching process previously described. In the framework given by this process, we address the problem of estimating the offspring distribution of the cell population from a Bayesian standpoint by making use of disparity measures.
Specifically, we consider a two-type controlled branching process with binomial control, which is a particular case of the model introduced in González et al. (2005). The process is defined recursively as follows: n,j ) : j ∈ N, n ∈ N 0 } and {φ n (z) : n ∈ N 0 , z ∈ N 2 0 } are two independent families of non-negative integer valued random variables satisfying the following conditions: (i) For each z = (z (1) , z (2) ) ∈ N 2 0 , the random variables {φ n (z) : n ∈ N 0 } are i.i.d. following a binomial distribution with parameters z (1) and γ ∈ (0, 1).

(ii) The random vectors {(X
(1) n,j , X (2) n,j ) : j ∈ N}, n ∈ N 0 are i.i.d. whose probability distribution is as follows: (iii) If n 1 , n 2 ∈ N 0 are such that n 1 = n 2 , then, the sequences {(X Intuitively, Z (j) n denotes the number of cells of type T j , j = 1, 2, in the nth generation, and X (1) n,j and X (2) n,j represent the number of cells of type T 1 and T 2 , respectively, produced by the jth progenitor of type T 1 in the generation n. The random variables φ n (z) are introduced to model the cell emigration, and therefore to determine the number of progenitors of type T 1 cells in the nth generation provided the population size at that generation is the vector z. Consequently, γ is the probability that a cell of type T 1 completes successfully its mitotic-cycle regardless of its outcome (and then 1 − γ is the probability of emigration of that T 1 cell).
We focus our attention on the offspring distribution, {p 0 , p 1 , p 2 }, where p 0 is the probability that a cell of type T 1 dies, p 1 is the probability that a cell of type T 1 divides into two cells of type T 1 and p 2 is the probability that a cell of type T 1 differentiates into a new cell of type T 2 . Using the time-lapse video recording, one can observe the entire family tree for that process, that is, the following sample Z * n = {Z (1) (1) l (j) is the number of cells of type T 1 in the l-th generation having exactly j offspring of type T 1 , j = 0, 2, and Λ l is the number of cells of type T 1 in the lth generation having exactly one offspring of type T 2 . Our aim is to estimate the offspring distribution by using disparity measures, to provide HPD regions for the corresponding D-posterior densities and to estimate the probability that the production of precursor cells follows a supercritical probability distribution.
Using the Markov property and the independence between the emigration and reproduction phases, one can see that the likelihood function f (Z * n |p 0 , p 1 , γ) satisfies is the total number of cells of type T 1 in the first n generations having exactly j offspring of type T 1 , j = 0, 2, Ψ n−1 = n−1 l=0 Λ l is the total number of cells of type T 1 in the first n generations having exactly one offspring of type T 2 , Δ n−1 = n−1 l=0 φ l (Z l ) is the total number of observed progenitor cells of type T 1 in the first n generations and Y (1) is the total number of individuals of type T 1 in the first n generations. Moreover, observe that Δ n−1 = Y Thus, the MLEs of p 0 , p 1 , p 2 and γ based on the sample Z * n are given, respectively, by Let us write p = {p 0 , p 1 , p 2 } and p n = { p 0,n , p 1,n , p 2,n }. Consider q = {q 0 , q 1 , q 2 } a probability distribution on the state space {(0, 0), (2, 0), (0, 1)}, π(·) a prior distribution on the space of the probability distributions defined on such a state space and θ = (q 0 , q 1 ). With analogous arguments as those in Section 3, we define the D-posterior density function of θ = (q 0 , q 1 ) as where Θ e −Δn−1D(pn,θ) π(θ)dθ = Θ e −Δn−1D(pn,q) π(q)dq 0 dq 1 and Θ = {(x 0 , x 1 ) ∈ (0, 1) × (0, 1) : x 0 + x 1 ≤ 1}. Note that in this case, the parametric family which we consider is F Θ = {{q 0 , q 1 , 1 − q 0 − q 1 } : (q 0 , q 1 ) ∈ Θ}. As a consequence, we propose as EDAP estimators of θ 0 = (p 0 , p 1 ) the following ones: and as MDAP estimators: (p +D 0,n , p +D 1,n ) = arg max θ∈Θ π n D (θ|p n ), where π n D (θ|p n ) was introduced in (13), and π n D (q 0 |p n ) and π n D (q 1 |p n ) are the marginal density functions of q 0 and q 1 , respectively.
The data that we consider are provided from two experiments developed under two different experimental conditions: in the first one, the cells were cultured in a vehicle solution without agent stimulating differentiation, whereas in the second experiment, they were cultured with a specific hormone promoting oligodendrocyte generation. The first experiment started with 34 precursor cells cultured in a vehicle solution (with no agent promoting oligodendrocyte generation) and whose evolution was observed until the generation n = 7. On the other hand, the initial number of cells of type T 1 in experiment 2 was 30, which were cultured in solution with a specific hormone promoting oligodendrocyte generation and whose branching tree was observed until the generation n = 5. Based on the observed branching trees, we computed the non-parametric MLE for the parameters related to the reproduction and the emigration processes, which are given in Table 2.   Table 2: MLE for the main parameters of the model.
Using the MLE of p,p n = { p 0,n , p 1,n , p 2,n }, for each experiment, we estimated the D-posterior density functions of (q 0 , q 1 ) atp n for the Hellinger distance, the negative exponential disparity and the Kullback-Leibler divergence. Since the D-posterior density function is known, we generated 10 3 -uniformly distributed -points from the set Θ and determined the value of the D-posterior density functions at them. In all the cases, we consider the Dirichlet distribution with parameter (1/2, 1/2, 1/2) as the prior distribution due to the fact that no prior knowledge about the offspring distribution -apart from its support -is available (see Berger and Bernardo (1992)). Moreover, in order to estimate the integral in (13) we used the Monte Carlo integration method, by drawing a sample of 2 · 10 6 points from the previous prior distribution. To that end, we used the function rdirichlet() from the package gtools (see Warnes et al. (2018)).
The contour plots of these D-posterior density functions for experiments 1 and 2 are represented in Figures 1 and 2 respectively, of Subsection 4.1 in the Supplementary Material, along with the 95% HPD regions for (q 0 , q 1 ). We also calculated the EDAP and MDAP estimators of the parameters p 0 and p 1 for the aforementioned disparity measures, which are presented in Tables 3 and 4 for the first and the second experiment, respectively. To compute the EDAP estimates, we used again the Monte Carlo integration method based on a sample of 2·10 6 points, whereas to determine the MDAP estimates we applied numerical methods, and more specifically, we used the function nloptr() from the package nloptr (see Ypma et al. (2018) for details). Note that, the EDAP and MDAP estimates are close to the non-parametric estimators in Table 2. As was explained at the beginning of example, it is reasonable to assume that the observed data has no contamination. Under this assumption, one would expect, as proved in Theorems 4.2-4.5, the estimators based on KL posterior, HD posterior, and NED posterior to be similar. A numerical illustration of this issue via a simulated example is presented in the next subsection. Therefore, in a contamination-free context, it seems that the choice of the disparity measure has no impact in the estimates obtained, as expected.
Finally, we performed a sensitivity analysis to examine the influence of the parameters of Dirichlet distribution on EDAP and MDAP estimates. We mention here that there were no significant changes to the estimates due to changes in the parameters.  Table 4: Estimates of p 0 , p 1 and p 2 based on the EDAP and MDAP estimators for HD, NED and KL in the second experiment.
An interesting issue to tackle in this Bayesian framework is to determine the probability of having a supercritical offspring distribution governing the production of precursor cells. This is an important problem since if the mean of the variable X (1) 0,1 , m = 2p 1 , is less than 1, then the population of cells of type T 1 will become extinct with probability 1, so does the population of type T 2 cells. Using the approximation of the D-posterior distribution atp n in both experiments, we present an estimation of the probability that m > 1, with respect to the D-posterior distribution atp n , in Table 1 of Subsection 4.1 of the Supplementary Material. Note that the aforementioned probability is greater in the second experiment and this could mean that the solution with the hormone is effective at promoting cell reproduction. Indeed, by comparing Tables 3 and 4, one also sees that the probability that a cell of type T 1 dies with no offspring is smaller in the second experiment whereas the probability of differentiating into a type T 2 cell is greater.

Simulated example
This example is a continuation of Example 1 introduced in Section 2 and its purpose is to show the behaviour of EDAP and MDAP estimators in presence of contamination, specifically gross error contamination model. Recall that the process starts with Z 0 = 1 individual, the offspring distribution is a geometric distribution with parameter θ 0 = 0.3 and which is affected by outliers, as described in Example 1, and the control variables φ n (k) follow Poisson distributions with mean λk, for each k ∈ N 0 and each n ∈ N 0 , with λ = 0.3. It is important to mention that the geometric distribution is a natural offspring distribution to use in the context of branching processes, as explained in the Introduction. Observe that we consider an extreme contamination framework; indeed, in this process, each progenitor has exactly 11 offspring with probability 0.15 and, with probability 0.85, it has offspring according to the aforementioned geometric distribution. From a practical viewpoint, this might seem to be too much extreme example, however, its choice is motivated by the fact that it allows to illustrate appropriately the accuracy of the method in a contamination context.
In order to approximate the D-posterior density functions based on the Hellinger distance (HD) and the negative exponential disparity (NED), one can consider different approaches. One of them is to obtain a sample of such a density function by applying the Metropolis-Hastings algorithm, and use this sample to estimate the corresponding density function (this approach was used in Hooker and Vidyshankar (2014)). However, since an expression of the D-posterior density function is available (see (5)), we have opted for estimating both the D-posterior density function and the EDAP and MDAP estimators by using numerical methods. The reason for the choice of the latter approach is that it is computationally much faster than the former; this fact is especially remarkable when considering a disparity measure different from the KL disparity since in that case it is not possible to use a conjugate families of distributions (see Hooker and Vidyshankar (2014)). Specifically, to approximate the integrals involved in (5) and (7) we used the Newton-Cotes formulas implemented in the function cotes() of the library pracma (see Borchers (2019)). On the other hand, to compute the MDAP estimates we have made use of the function nloptr() of the package nloptr again. We also remark that the computational cost is not significantly increased when replacing the KL disparity with the HD or the NED. Indeed, it only takes 0.08, 0.22, and 0.11 seconds to estimate the D-posterior density and to compute the EDAP and MDAP estimates for the HD, the NED, and the KL, respectively, for a single generation. This clearly shows the advantage of using numerical methods over other algorithms such as Metropolis-Hastings or Gibbs sampler in this context. Moreover, the differences in the computational time are essentially caused by the number of operations involved in the definition of the HD, NED, and KL. Computations were performed on an Intel Core i7-8650U CPU running at 1.90 GHz with 32 GB of RAM.  The strong consistency of EDAP and MDAP estimates based on HD and NED are illustrated in Figure 2. The evolution of HPD intervals is also plotted. These estimates are shown to be accurate in contrast to those plotted in Figure 1 (right). Moreover, by comparing the results obtained with the HD and the NED we conclude that there is no significant difference between them. This is not surprising due to the fact that each of them behaves as reasonably well as the other one against outlier contamination as our results in Section 5 state. The NED is, however, preferable under inlier contamination as explained in Lindsay (1994). The influence function plots and the robustness of HD and NED are illustrated in Figure 3, for generations 25, 35 and 45. One can check a similar behaviour of the influence function plots for other values of n in Figure 6 in Subsection 5.3 of the Supplementary Material. We have also investigated the dependence of the influence function on the prior distribution as a part of the sensitivity study developed in Subsection 5.5 in the Supplementary Material. However, these graphs are omitted since no appreciable differences were observed. Taking into account that the offspring mean and variance can be written as continuous functions of the offspring parameter, we have used the EDAP and MDAP estimates of the offspring parameter to obtain estimates of them. The evolution of m(θ * D n ), and σ 2 (θ * D n ), for n = 5, . . . , 45, for D ∈ {HD, NED, KL}, are shown in Figure 4, whereas m(θ +D n ), and σ 2 (θ +D n ), for n = 5, . . . , 45, for D ∈ {HD, NED, KL}, are shown in Figure 5, both of them in Subsection 5.2 of the Supplementary Material. Moreover, note that due to the continuity, these estimators prove to be strongly consistent estimators for the corresponding parameter. All the above empirical results, together with the theoretical results in previous sections, allow us to conclude that the HD and the NED are better choices for disparity measures than the likelihood disparity, because in absence of contamination, the three of them possess the same asymptotic properties, however under contamination, the HD and the NED are the only ones that gives robust estimates.
A sensitivity analysis was performed to determine the influence of the choice of the prior distribution in the D-posterior density and the corresponding point estimators.
Since the parameter to be estimated lies in the interval [0, 1], we use as prior distribution beta distributions with parameters (ρ, β). Now, to apply the method, we take the parameters (ρ, β) belonging to the grid given by the Cartesian product of the set {n + 0.1 * k : n = 0, 1, . . . , 4; k = 1, 2, . . . 10} with itself. In Table 5 some results are summarized, especially those corresponding to prior distributions which are highly concentrated at extreme values far away from the parameter of interest. Despite this fact, one can observe a limited influence of the choice of the prior distribution on the EDAP and MDAP estimates. In addition, although the results are not shown here, it is interesting to mention that the effect of the prior distribution on the estimates decrease as the generation numbers increase (see Subsection 5.5 of the Supplementary Material for further details).
We also investigated other contamination models, namely, the mixture model for gross error by considering a geometric distribution with parameter θ 0 = 0.3 and Poisson control distributions with mean λk = 0.3k, with values of α and L different from 0.15 and 11, respectively. We emphasize here that the parameter λ = 0.3 is fixed and the role of k, as explained in the introduction, is to adapt to the number of parents in a given generation. Since the information in the sample until generation n depends on the mean growth rate, τ m , it is a function of α and L. Thus, to make meaningful comparisons, we choose α and L that yield similar values of τ m . As was already discussed, while Figure 1 illustrates the undesirable properties of EAP and MAP, the D-posteriors are unaffected by the contamination at different L and α. A summary of the behavior of EDAP estimates using KL, HD, and NED can be found in Figure 7 in Subsection 5.4 in the Supplementary Material. These results confirm that HD and NED are always preferable over KL. A similar observation also holds for MDAP.
We complete the illustration of the behaviour of the proposed methodology by considering the same mixture model for gross error of the previous offspring distribution and Poisson control distributions with mean λk = 0.45k. In such a situation any value of α ∈ [0, 1] leads to a supercritical CBP (see Assumption 1). In Figure 8 in Subsection 5.4 in the Supplementary Material, similar behaviour of the estimates from the uncontaminated model (α = 0) is illustrated; however, for the contaminated model (α = 0.01) and as L increases, the EDAP and MDAP based on HD and NED are unaffected while that based on KL is considerably affected.

Concluding remarks
For controlled branching processes, this paper addresses the robust estimation of the parameters of the offspring distribution in a Bayesian context making use of disparitybased methods. We have assumed that the offspring distribution belongs to a general parametric family and the inference on the parameter of the reproduction law is based on the sample given by the entire family tree.
First, we have defined the D-posterior density, a density function which is obtained by replacing the log-likelihood in Bayes rule with a conveniently scaled disparity measure. The expectation and mode of the D-posterior density, referred to as EDAP and MDAP estimators, respectively, are proposed as Bayes estimators for the offspring parameter by emulating the point estimators under the squared error loss function or under 0 − 1 loss function, respectively, for the posterior density. As a initial step for the analysis of the asymptotic and robustness properties of these estimators, their existence and measurability are studied. Moreover, sufficient conditions for the strong consistency and asymptotic normality of the EDAP and MDAP estimators, once they have been suitably centered and normalized, are provided.
In this paper we paid special attention to robust Bayesian inference for dependent tree-structured data. The EDAP and MDAP estimators, in the current context, are harder to analyze than the corresponding ones in the i.i.d. setting, due to the random sample size and the correlations induced by the random control function.
The analyses involve continuity and differentiability properties of Bayesian disparity functionals. These properties play an important role in the further development of robustness ideas in the context of the current data. We illustrate that the proposed estimators exhibit the robustness features when the number of generations increase.
Although the results are provided for a general class of disparity measures, we have focused our attention on the Hellinger distance and the negative exponential disparity in the examples presented. Contrary to the likelihood disparity, these disparity measures are known to provide robust estimators in an i.i.d. setting, while keeping desirable properties of estimators such as consistency and efficiency. Our results in the previous sections show that these features also hold in the branching framework, and thus, these disparities are preferable to the likelihood disparity. In the first example, we applied the methodology to real data from oligodendrocyte cell populations and estimated the main parameters of the process suggested to model such populations. The results in this example illustrate that in the absence of contamination, the properties of the EDAP and MDAP estimators are similar to traditional estimators based on likelihood disparity. The second example considers simulated data with the purpose of showing and analysing the robustness properties of the EDAP and MDAP estimators established in this paper. Thus, taken together both examples illustrate that EDAP and MDAP estimators, are efficient in a contamination-free context and robust to model misspecification and presence of outliers. additional figures, simulated data, summaries of the sensitivity analysis performed, and the computational codes developed for the examples are provided in the Supplementary Material.