Nonparametric estimation of the incubation time distribution

Nonparametric maximum likelihood estimators (MLEs) in inverse problems often have non-normal limit distributions, like Chernoff's distribution. However, if one considers smooth functionals of the model, with corresponding functionals of the MLE, one gets normal limit distributions and faster rates of convergence. We demonstrate this for a model for the incubation time of a disease. The usual approach in the latter models is to use parametric distributions, like Weibull and gamma distributions, which leads to inconsistent estimators. Smoothed bootstrap methods are discussed for constructing confidence intervals. The classical bootstrap, based on the nonparametric MLE itself, has been proved to be inconsistent in this situation.

1. Introduction.We consider the following model, used for estimating the distribution of the incubation time of a disease.There is an infection time U , uniformly distributed on an interval [0, E], where E ("exposure time") has an absolutely continuous distribution function F E on an interval [0, M 2 ], and where U is uniform on [0, E], conditionally on E. Moreover there is an incubation time V with an absolutely continuous distribution F 0 on an interval [0, M 1 ] and a time for getting symptomatic S, where S = U + V .We assume that U and V are independent, conditionally on E. Our observations consist of the pairs of exposure times and times of getting symptomatic (E i , S i ), i = 1, . . ., n.
We define the (convolution) density q F of (E i , S i ) by q F (e, s) = e −1 {F (s) − F (s − e)} = e −1 s u=(s−e)+ dF (u), e > 0, s ∈ [0, M ], (1.1) w.r.t.µ, which is the product of the measure dF E of the exposure time E and Lebesgue measure on [0, M ], where M = M 1 + M 2 is the upper bound for the time S of getting symptomatic.We define the underlying measure Q 0 for (E i , S i ) by dQ 0 (e, s) = q F (e, s) ds dF E (e), s ∈ [0, M ], e ∈ (0, For estimating the distribution function F 0 of the incubation time, usually parametric distributions are used, like the Weibull, log-normal or gamma distribution.However, in [6] the nonparametric maximum likelihood estimator is used.The maximum likelihood estimator Fn maximizes the function over all distribution functions F on R which satisfy F (x) = 0, x ≤ 0, see [6].
Although the model is rather different, the algorithmic problem of computing the MLE has similarities with the problem of computing the MLE in the so-called interval censoring, case 2, model.In the interval censoring, case 2, model the log likelihood is of the form where U i < V i are observation times and we only have information on whether our hidden variable of interest, with distribution function F , is to the left of U i (∆ i1 = 1), between U i and V i (∆ i2 = 1), or to the right of V i (∆ i3 = 1), see [9] and [11].
For the incubation time model we have a formally similar way of writing the log likelihood, as can be seen in the following way.First of all, we can introduce, as in [6], the indicator ∆ i1 , defined by If ∆ i1 = 1, then S i ≤ E i , leading to a term log F (S i ) in the log likelihood.We can also introduce a second and third indicator ∆ i2 , and ∆ i3 , which depend on whether S i ≤ max j (S j − E j ) or S i > max j (S j − E j ).Note that if S i > max j (S j − E j ), the distribution function F maximizing (1.3) will take the value 1 at S i , since there is no term log{F (S j ) − F (S j − E j )} with S j − E j ≥ S i .So there is no impediment to giving F (S i ) its maximal value, which is 1.
Hence, defining this time and we can write the log likelihood in the incubation time model in the form ( 1.8) with the ∆ ij 's defined by (1.5) to (1.7), using a preliminary reduction of the maximization problem that was also used in [11] in the interval censoring, case 2, problem.Note, however, that the indicators ∆ i2 and ∆ i3 have a very different meaning in the interval censoring model.REMARK 1.1.Note that if S i > M 1 , where M 1 is the (unknown) upper bound for the length of the incubation time, we must have: since max j:Sj>Ej (S j − E j ) is a lower bound for M 1 .So a distribution function F , maximizing the log likelihood, will assign the value 1 to F (S i ) if S i > M 1 .
The limit distribution of the nonparametric MLE of the incubation time distribution was unknown, but because of the (at least algorithmic) similarity of its computation to the computation of the nonparametric MLE in the interval censoring problem, one would expect that similar techniques could be used in its derivation.The algorithmic similarity was indeed used in [6], but the limit distribution remained an open problem.
On the other hand, the limit distribution of the MLE for the interval censoring problem was derived in [4], in the so-called strictly separated case, where the length of the observation intervals has a strictly positive lower bound.Unfortunately, the proof is very complicated, and the result seems to be little known.Nevertheless, in the present absence of other methods, this is also the way we proceed in this paper, where we prove the limit result using similar methods, in the hope that this will revive the interest in these matters.
We give the result on the convergence of the rescaled MLE to Chernoff's limit distribution in Section 4, under a condition that seems somewhat similar to the strict separation hypothesis in the interval censoring problem.There is the general expectation that Chernoff's limit distribution will often occur as universal limit distribution in this context of inverse problems, but the difficulty of proving it lies in the fact that we have to derive it from the properties of solutions of (Fredholm) integral equations and that we do not have explicit representations to go on.
As a preparation to this result, we first characterize the MLE as the derivative of the least convex minorant off a self-induced cusum diagram in section 2. The cusum diagram is an often used tool in the theory of isotonic regression (because the distribution function is monotone, our estimation problem is an isotonic regression problem), but the peculiar feature of the cusum diagrams used here is that they contain the solution Fn (the MLE) itself in their definition.The necessary and sufficient condition in the characterization of the MLE in this way are given in section 2. We illustrate the (iterative) algorithm for computing the MLE on a data set on COVID-19, also analyzed in [6].Algorithms of this type (the iterative convex minorant algorithms) were proved to converge in [15].
The characterization of section 2 is used to prove consistency of the MLE in section 3. The convergence to Chernoff's distribution is then given in section 4, where also some numerical results on its variance are given.
Then, in section 5 we define the Smoothed Maximum Likelihood Estimator (SMLE) Fnh (t) at points t away from the boundary by (1.9) where IK is an integrated kernel, defined by and K is a symmetric kernel of the usual kind, used in density estimation.Near the boundary we use the Schuster-type boundary correction, also used in Section 11.3 of [9] in the definition of the SMLE.We assume that K has support It is shown that the SMLE has a normal limit distribution and has a faster convergence than the nonparamtric MLE itself (rate n 2/5 instead of n 1/3 ).In spite of the fact that its variance is implicitly defined as the solution of an integral equation, we can compute bootstrap confidence intervals for the distribution function via the SMLE, where we do not have to assume that the (asymptotic) variance is known.In this section we also give a method for determining the asymptotically optimal bandwidth automatically (see subsection 5.1).
It has several advantages to base the confidence intervals on the SMLE instead of the MLE.It follows from recently developed theory (see, e.g., [18]) that direct bootstrap confidence intervals, based on the MLE, will be inconsistent.In principle one can base bootstrap confidence intervals on the MLE, using the SMLE intermediately to center the bootstrap intervals, as is done in [18] for the interval censoring model.But these intervals converge at a lower speed again and have the unpleasant property of having jumps at the locations of the jumps of the MLE, which is a kind of artefact of the method.Since one needs the SMLE anyway for making the bootstrap confidence intervals consistent, it seems preferable to also base the confidence intervals on the SMLE itself (as is done in the present paper), and not on the MLE.Moreover, the MLE has the property to jump to the value 1 too early, something that is avoided if one uses the SMLE.
Finally, in sections 6 to 7 we show that the SMLE is competitive with the parametric models, even if the parametric assumptions are satisfied, and avoids the inconsistency that is inherent in the use of the latter methods.Moreover, one is discharged from the duty of introducing several of these parametric methods, since there is no sound reason to choose one of them.
Our computations are reproducible by using the R scripts in [5].
2. Characterization of the nonparametric maximum likelihood estimator (MLE) for the incubation time distribution.First, just as in [6], we reduce the problem to the problem of maximizing on the cone {y ∈ R m + : y = (y 1 , . . ., y m ), 0 ≤ y 1 ≤ • • • ≤ y m }, where the y j represent the values F (S i ) and F (S i − E i ) and where m is suitably chosen (see [6]).
Since, again just as in [6], we want to reduce the problem to the problem of maximizing inside the cone, enabling us to differentiate w.r.t. the variables y i , we define Note that other choices of F at these points would make the log likelihood smaller.For convenience of notation, we define Then, for a distribution function F on R + , satisfying these conditions, we define the process dQ n (e, s) where we define 0/0 = 0, and where Q n is the empirical distribution of (E 1 , S 1 ), . . ., (E n , S n ).Note that the MLE is only determined at the points S i and (S i − E i )1 {Si>Ei} and is zero at 0.
We restrict the distribution functions, occurring in the problem of maximizing the likelihood to the following set.DEFINITION 2.1.Let F n be the set of discrete distribution functions F , which only have mass at the points S i or (S i − E i )1 {Si>Ei} and satisfy 3) and S i − E i is not of type (2.4).Then 0 < F (T i ) < 1 for i = 1, . . ., m, if the log likelihood for F is finite, and we can define: The log likelihood for F ∈ F n , divided by n, can be written The corresponding function of y = (y 1 , . . ., y m ) = (F (T 1 ), . . ., F (T m )) is denoted by ϕ n (y): We have the following lemma.LEMMA 2.1.Let W n,F be defined by (2.2) and let ϕ n be defined by (2.6).Moreover, let where T 1 < • • • < T m are the points S i and (S i − E i )1 {Si>Ei} which are not of type (2.3) or (2.4), arranged in strictly increasing order Then where ∆W n,F (T j ) is the increment of the process W n,F at T j .
PROOF.If T j corresponds to a value S i such that S i ≤ E i , the corresponding term in (2.5) is of the form log F (S i ) and differentiation of (2.6) w.r.t.F (S i ) gives the following contribution to the partial derivative ∂ ∂yj ϕ n (y): where we make a summation over the k to allow for possible ties at where we make again a summation over k to allow for possible ties at S i .If T j corresponds to a point S i such that T j = S i > E i and S i ≤ m n , we deal with the argument F (S i ) of the term where we make again a summation over k to allow for possible ties at S i .Finally, if T j corresponds to a point S i − E i such that S i > m n ∨ E i , we deal with a term and differentiation and differentiation of ϕ n (y) w.r.t.F (S i − E i ) gives a contribution So we get: The following lemma characterizes the MLE.
Conversely, if Fn maximizes the likelihood and y = ( Fn (T The uniqueness can be proved aong the same lines as in the proof of Proposition 1.3 in [11]. We omit the details.The point process {(T i , W n, Fn (T i )), i = 1, 2, . . .} for points T i , running through points S i and S i − E i between the minimum of the S i and the maximum of the S i − E i .Sample size n = 100.The data correspond to a truncated Weibull distribution for the incubation time distribution, used in simulations of the incubation time distribution.The points are connected by line segments.
A picture of the point process {(T i , W n, Fn (T i )), i = 1, 2, . . .} in a simulation of the incubation time distribution, for T i running through the points S i and (S i − E i ) + between the minimum of the S i and the maximum of the (S i − E i ) + . is given in Figure 1 for sample size n = 100.The process W n, Fn touches zero at points just to the left of points of mass of Fn .
We define the process V n by Fn (u) dG n (u) + W n, Fn (t), (2.10) where G n = G n, Fn is defined by (2.13) for F = Fn .Thus Fn is obtained by taking the leftcontinuous slope of the "self-induced" cusum diagram, defined by (0, 0) and points As explained in [6], one can compute the MLE by the iterative convex minorant algorithm, where one computes iteratively the greatest convex minorant of the cusum diagram with points (0, 0) and points where the "weight process" G n,F is defined by dQ n (e, s) dQ n (e, s), (2.13) where F is the temporary estimate of the distribution function at an iteration.The MLE Fn corresponds to a stationary point of this algorithm and is given by the left-continuous slope of the greatest convex minorant of the cusum diagram, see Figure 2. See [6] for further remarks on this algorithm.The algorithm is implemented in the R scripts in [5].
where T i runs through the ordered points (S i − E i ) + and S i and V n is defined by (2.10), together with its greatest convex minorant (red curve).Sample size n = 100.
The steps in the iterative algorithm are modified by a line search algorithm, see section 7.3 of [9] and in particular the description of this modified iterative convex minorant algorithm on p. 173.The modified iterative convex minorant algorithm is guaranteed to converge by Theorem 7.3 of [9], see also [15].The modified version is used in the R scripts acccompanying this paper [5].EXAMPLE 2.1.The first time the methods just described were used for the estimation of the incubation time distribution for Covid-19 was in the analysis of data on 88 travelers from Wuhan in [6].The data set is included in [6] and extracted from the supplementary material of [1].In this case we have: where m = 6, ϕ n is defined by (2.5), and the matrix (N ij ), 0 ≤ i < j ≤ m + 1 is given by: The corresponding points T 1 , . . ., T 6 are given by T i = i + 2 and the MLE is given in Figure 3.For comparison the maximum likelihood estimator, assuming that the incubation time distribution is a Weibull distribution, is also given in this picture.The nonparametric MLE Fn was denoted by Ĝn in [6].
The result can be reproduced by running the R script analysis ICM.R in [5].3. Consistency of the MLE.We have the following result.
THEOREM 3.1.Let the incubation time distribution function F 0 have a strictly positive continuous density f 0 on (0, M 1 ), for some ) and has a continuous strictly positive derivative f E on [ε, M 2 ].Let Fn ∈ F n be the nonparametric MLE, where F n is the set of distribution functions defined in Definition 2.1.Then Fn converges almost surely to F 0 on [0, M 1 ] in the supremum metric.
There are a lot of different ways to prove consistency, but we feel a preference for the elegant method in [14], which is used in the proof below.
PROOF.We use the same method as in section 4.2 of the second part of the book [11].Let ψ(F ) be defined by for distribution functions F on R, which satisfy F (x) = 0, x ≤ 0. Then we get: since Fn is the MLE.The limit exists because of the concavity of ψ.Evaluating this limit, we get: Proceeding as in section 4.2 of the second part of the book [11], we get from this, if F is a limit point for Fn (using Helly's compactness theorem): We want to show that the minimum over F on the left is equal to 1, and that this can only be attained if We therefore have: The function ).If F would not be equal to F 0 , it would be different from F 0 on an interval, using the monotonicity of F and F 0 and the continuity of F 0 , and then the left side of (3.3) would be strictly larger than 2. Hence, by (3.2), the left side of (3.1) would be strictly larger than 1, a contradiction.
4. Asymptotic distribution of the MLE in the model for the incubation time.We have the following result for the MLE in the model for the incubation time.THEOREM 4.1.Let the conditions of Theorem 3.1 be satisfied and let, moreover, f E have a bounded derivative on the interval (ε, M 2 ).Let Fn ∈ F n be the nonparametric MLE, where the set of distribution functions F n is defined in Definition 2.1, and let F 0 be the distribution function of the incubation time.Then we have at a point t 0 ∈ (0, M 1 ): where W is two-sided Brownian motion on R, originating from zero and where the constant c E is given by: The result shows that the limit distribution is given by Chernoff's distribution.The jump in difficulty of the proof in going from the corresponding result for the current status model (not discussed in this paper) to more general cases of interval censoring models and to the model for the incubation time distribution is considerable.One expects in fact that Chernoff's distribution will often occur as (a universal) limit distribution in these contexts, but proving this might be very hard.
The proof of Theorem 4.1 is given in the Appendix, section 9.3.For the interval censoring, case 2, model the limit distribution of the MLE, under the so-called strict separation condition, was derived in [4].The strict separation condition in the interval censoring, case 2, model seems somewhat comparable to the condition that E i has no mass on an interval [0, ε] in the present model.That the exposure time (as observed!) has a strictly positive lower bound does not seem such an unreasonable assumption.
At several places of the proof in section 9.3 of the appendix, we use arguments improving on the arguments in [4], and the proof of the limit result for the interval censoring model in [4] could be improved similarly.But we will not go into these matters in this paper.
It is of interest to investigate whether the nonparametric MLE has sample variances that resemble the asymptotic variance of Theorem 4.1.To this end we computed the variances over 1000 simulations of the MLE Fn (t) at t = 6, if the underlying distribution function is the truncated Weibull distribution function F α,β , defined by where we choose M 1 = 20.We chose α = 3.03514 and β = 0.0026195, which were the values of the estimates in the data on the Wuhan travelers, discussed in [6] and section 5 of the present paper, if one assumes that F 0 is a truncated Weibull distribution.The distribution function F E of the exposure time was taken to be the uniform distribution on the interval [1,30].The asymptotic variance, given by Theorem 4.1, was computed using the software package Mathematica, where the asymptotic variance σ 2 of the location of the minimum of W (t) + t 2 , t ∈ R, was taken from [12], see Table 4 of [12], where the value σ 2 = 0.26355964 is given.
The results suggest a downward trend of the sample variances to the asymptotic value.

5.
Confidence intervals for the distribution function.We now construct pointwise confidence intervals for the distribution function on the basis of the SMLE (smoothed maximum likelihood estimator), defined by (1.9).We have the following result.THEOREM 5.1.Let the conditions of Theorem 4.1 be satisfied and let h n be a bandwidth such that h n ∼ cn −1/5 , as n → ∞, for some c > 0.Moreover, let the density f 0 be differentiable at t ∈ (0, M 1 ), and let K h be defined by for the symmetric kernel K which is the derivative of IK.Finally, let the SMLE Fnh be defined by (1.9).Then Fn,hn (t) − F 0 (t) where where and the function ϕ n,t,F0 solves the integral equation (5.4) The proof is given in Section 9.4.We here give an outline of the proof.We consider: and where ϕ Fn solves the integral equation replacing F 0 by Fn in (5.4).Note that ϕ Fn has both discrete and absolutely continuous parts.We do not have an explicit expression for ϕ Fn or ϕ n,t,F0 , but can compute it numerically, see [5].Let Q n the empirical measure of (E 1 , S 1 ), . . ., (E n , S n ).The proof of the result can then be continued by proving where we have a representation of IK h (t − u) d Fn − F 0 (u) in terms of an integral in the observation space θ n,t,F0 d(Q n − Q 0 ) in the last line, see Section 9.4.
Using Theorem 5.1, we can construct confidence intervals, using the bootstrap, where we keep the exposure times E i fixed.Our bootstrap sample consists of: , where and where U * i is uniform on [0, E i ] and V * i is generated from the SMLE of the incubation time with a bandwidth h 0 of order n −1/9 .The oversmoothing is used to deal with the bias (see below) and was introduced in [13] in the context of nonparametric regression analysis.
The random variables V * i are computed by generating Uniform(0, 1) random variables W i and solving the equation Fnh0 (x) = W i , (5.8) in x, using golden-section search, and taking V * i = x for such x, solving (5.8).See the code bootstrap SMLE.cpp in [5], which is used in the corresponding R script bootstrap SMLE.R.
The 95% bootstrap confidence intervals are given by where Q * 0.025 and Q * 0.975 are the 2.5th and 97.5th percentiles of the values of for 1000 (bootstrap) samples of (5.7), and where F * nh is the SMLE with bandwidth h, corresponding to the MLE F * n computed for a bootstrap sample of size n.
EXAMPLE 5.1.We consider again the data on the Wuhan travelers, considered in Example 2.1.For Figure 4 all 1000 values F * nh (t) − Fn,h0 (t), and the percentiles Q * 0.025 (t) and Q * 0.975 (t) were determined.This gives the bootstrap intervals (5.9). Figure 4 can be reproduced by running the R script bootstrap SMLE.R in [5].
The following result shows that we can expect the SMLE's computed on the basis of the bootstrap samples to behave asymptotically in the same way as the original SMLE's.The red curve is the SMLE Fn,h , where h = 6.21628.
The proof of this result follows the lines of the proof of Theorem 5.1 in Section 9.4, apart from the fact that in this case the distribution function generating the (bootstrap) incubation times is Fnh0 (instead of F 0 ), which converges almost surely to F 0 and that therefore Lindeberg-Feller conditions are used for the central limit theorem because of the varying underlying measure.This matter is discussed in the context of a nonparametric regression model in [10].
Note that the centering constant µ arises from the difference and that we need here that h 0 tends to zero slower than n −1/5 , see section 5.1 below.

Bandwidth selection.
We propose a bootstrap method to find an approximately MSE optimal bandwidth for estimating F 0 (t) at a point t ∈ (0, M 1 ).The MSE we want to minimize as a function of h is given by: The analogous bootstrap quantity (using oversmoothing, in the sense that h 0 ≍ n −1/9 ) is given by: where h 0 is called a "pilot" bandwidth.We shall show that (5.11) is asymptotically independent of the constant c 0 in the pilot bandwidth h 0 if we take h 0 = c 0 n −1/9 .
Using integration by parts, we can write We have (MSE is variance + squared bias): For the second term on the right we get: We have the following result.
LEMMA 5.1.Let the conditions of Theorem 5.1 be satisfied.Moreover, let h 0 = h n,0 ∼ c 0 n −1/9 , as n → ∞.Then For reasons of space we omit the proof here.More details on this type of lemma can be found in [10] in the context of monotone regression.REMARK 5.1.Note that this convergence result does not hold if the pilot bandwidth h 0 is of order n −1/5 .For this reason the method suggested in [18], where the pilot bandwidth is chosen of order n −1/5 will not work, since in that case the variance of F ′′ nh0 (t) will not tend to zero.Another way out is to use subsampling, as used in [8], but choosing the right subsample size is a rather hard problem.
Since the first order behavior of the first term of (5.12) also only depends on h and n and not on h 0 , the dependence on the constant c 0 in the pilot bandwidth disappears in first order, as n → ∞, which indicates robustness of this bandwidth choice procedure.In fact, n 4/5 times the first term of (5.12) tends to the limit variance (5.3) and n 4/5 times the second term of (5.12) tends to the squared bias µ 2 , where µ is defined by (5.2), irrespective of the constant c 0 in the pilot bandwidth h 0 = c 0 n −1/9 .Instead of minimizing (5.11) we minimize a Monte Carlo approximation of a modified version of (5.11): where the F * ,i nh (t j ), i = 1, . . ., B, t j , . . ., m are the estimates in B bootstrap samples on a grid of equidstant points t j .The script for this procedure is given by the R script bandwidth choice df.R on [5].This script produced the bandwidth in Figure 4. We took B = 10, 000 in this case.
A picture of the bootstrap MSE (5.13) as a function of h for the data on the travelers from Wuhan is shown in Figure 5.
6. Estimation of quantiles and comparison with parametric methods.We now illustrate the difference between the nonparametric approach and the approach using distributions like the Weibull, log-normal, etc. for the incubation time distribution.This is shown for the problem of estimating the 95th percentile of the distribution.To this end we generated 1000 samples of size n = 500 and also size n = 1000, using the same Weibull distribution to generate the incubation time distribution as we used in Section 5 for constructing confidence intervals.This example is also given (for sample size n = 500) in [7].
In the nonparametric maximum likelihood approach we simply maximize over all distribution functions F .This give the nonparametric MLE Fn , from which we compute the SMLE Fn,hn (t) = IK h (t − y) d Fn (y) and the estimate of the 95th percentile F −1 n,hn (0.95).The bandwidth h was chosen to be h n = 6n −1/5 here.Note that, using the delta method, we find: , where f α,β is the (truncated) Weibull density, corresponding to the distribution function F α,β , defined by (4.3).
The results of this simulation for 1000 samples of size n = 500 are shown in the box plot Figure 6 and the corresponding picture for sample size n = 1000 in Figure 7.The black line segments in the boxes are at the position of the median.Finally, the red line denotes the value of F −1 α,β (0.95) ≈ 10.17716, where (α, β) are the parameters of the Weibull distribution.R scripts for all methods are given in the directory "simulations" of [5].It can be seen that, since the incubation time data were generated from a Weibull distribution, the estimates of the quantiles assuming this distribution have indeed the smallest variation.But the nonparametric estimates, not making the assumption that the distribution is of the Weibull type, are also pretty good, whereas the estimates, assuming a log-normal distribution are completely off (in fact, these estimate are inconsistent).The SMLE adapts to the underlying distribution and provides consistent estimates, using the consistency of the MLE itself, derived in Section 3 and the consistency of the SMLE, which can be deduced from this.
One sees that the uncertainty about what interest us, is not much larger when one only uses the nonparametric SMLE than when one assumes that the incubation time distribution is Weibull (which is the correct distribution in this simulation setting), but much larger when one assumes log-normal.While there is absolutely no scientific (medical) reason to "believe" Weibull, or to "believe" log-normal.They lead to completely different statistical inferences, hence could lead to completely different policy recommendations.
7. Other smooth functionals.The first moment is the prototype of a smooth functional, The asymptotic normality and √ n convergence of the estimate where Fn is the nonparametric MLE was given for the current status model in [11], Theorem 5.5 of Part 2. The asymptotic variance is given by where g is the density of the observation times and F 0 the distribution function of the hidden estimate.
Similar results for more general cases of interval censoring are given in Chapter 10 of [9], but in those cases the expression for the asymptotic varaiance is coming from the solution of an integral equation and no longer explicit as in the case of the current status model.A similar situation holds for the model for the incubation time distribution.
We have the following asymptotic normality result for the estimate of the first moment, based on the nonparametric MLE Fn for the incubation time model, if the support of the incubation time distribution is [0, M 1 ]: where N (0, σ 2 ) is a normal distribution with mean zero and variance and ϕ F0 is also the solution of the following equation in ϕ: The distribution function F E of the exposure time was again chosen to be the uniform distribution function on [1,30].The derivation of this result is given in the Appendix.The inconsistency of the estimate based on the log normal model is again clearly seen from the boxplot Figure 8.However, if we would have generated the incubation time distribution from a log normal distribution, the estimate based on the Weibull distribution would be inconsistent, so Figure 8 cannot be interpreted as showing the superiority of the Weibull distribution.
Examples of the behavior of the density estimate which converges at rate n 2/7 , where h ∼ cn −1/7 , c > 0, are given in [6] and [7] 8. Conclusion.We proved that the nonparametric MLE in a model for the incubation time distribution converges in distribution, after standardization, to Chernoff's distribution.The rate of convergence is cube root n, if n is the sample size, under a separation condition for the exposure time.We also discussed (locally) differentiable functionals of the model, estimated by corresponding functionals of the nonparametric MLE, which converge after standardization to a normal distribution at faster rates, where the constants are given by the solution of an integral equation.
This provides an alternative for the parametric models that are usually applied in this context, estimating the incubaton time distribution by, e.g., Weibull, gamma or log-normal distributions.If the parametric model is not right (there is in fact no scientific or medical reason to choose for Weibull, gamma, Erlang, log-normal.etc., and one sees for this reason usually these distribution applied at the same time) the estimates are inconsistent if the chosen model does not hold, as we demonstrate in Sections 6 and 7 As shown in Section 7, for parameters like the first moment, we do not have to choose a bandwidth parameter, while the behavior of the estimate based on the nonparametric MLE is competitive to the parametric estimates in this case, even if the model for the parametric estimate is right.
R scripts for computing the estimates are given in [5].9. Appendix.

Integral equations.
As explained in the Appendix of [6], the theory of the estimation of smooth functionals in the model is based on certain integral equations.For the incubation time model an extra indicator ∆ i was introduced to indicate whether S i ≤ E i .
But introducing such an indicator is not necessary.So we define the score function θ without these idicators and get as definition for the score function: (compare to (A.1) in [6]).This is the conditional expectation of a(X) in the "hidden" space of the variable of interest (the incubation time), given our observation (E, S).Note that we changed the notation somewhat w.r.t.[6], and denote the distribution function of the incubation time by F instead of G. Defining Note that in [6] the distribution function F E was taken to be uniform, just as an example, but that we do not assume that here.Differentating (9.1) w.r.t.x, we get the following integral equation, with on the right the derivative of the functional we want to estimate, denoted by ψ: This is called a Fredholm integral equation of the second kind.Here and in the sequel, we assume that distribution functions are right-continuous.
We assume that F E satisfies the following condition: ) and has a continuous strictly positive derivative We define the class of distribution functions F in the following way.
(F2) F consists of the distribution functions F on [0, M 1 ] with only a finite number of jumps, contained in (0, M 1 ), and satisfying where ε > 0 is defined as in Definition (F1).The distribution functions are extended to [0, ∞) by defining it to be equal to 1 on [M 1 , ∞).
Note that the MLE satisfies the conditions of (F2) for sufficiently large n, by the conditions of Theorem 4.1 (in particular the fact that the density f 0 is strictly positive on (0, M 1 )) and the consistency of the MLE (Theorem 3.1).LEMMA 9.1.Let F E be a distribution function on [0, M 1 ], satisfying condition (F1) and let F ∈ F , where F is defined in (F2).Moreover, let, F ∈ F satisfy: Finally, let ψ : [0, M 1 ] → R be a bounded right-continuous function with left-limits (cadlag) on [0, M 1 ].Then the equation ( 9.2) has a unique solution in ϕ : 1 {e<t} dF E (e).(9.5)Since F (0) = 0 and F (M 1 ) = 1, the solution has to be zero at t = 0 and t = M 1 .Suppose there exists a point t ∈ (0, M 1 ) such that ϕ(t) > 0 for a solution ϕ.If the maximum of ϕ is reached at the point s ∈ (0, M 1 ), we get that the right-hand side of (9.5) is nonpositive at t = s, whereas the left-hand side is > 0, a contradiction.Note that we use condition (9.4).
If the supremum is not attained, we take the limit from the left and get that the limit on the left is strictly positive, whereas the limit on the right in nonpositive, which leads again to a contradiction.
A similar argument holds if there exists a point t such that ϕ(t) < 0. It now follows from Theorem 3.4 in [16] that the integral equation has a unique solution (see also Lemma 10.1 on p. 294 of [9]).EXAMPLE 9.1.We consider the example ψ ≡ 1, F E is the uniform distribution on [1, 30] and F 0 is uniform on [0, 20].In that case all conditions of Lemma 9.1 are satisfied.One could think of an exposure time of at most 30 days and at least one day in the incubation time model.It is shown in Section 9.5 that where where ϕ F0 solves (9.2) for ψ ≡ 1 and F = F 0 , where we choose in the present example F 0 to be the uniform distribution on [0, M 1 ].We can solve equation (9.2) approximately by a matrix equation, see [5].
, where a ∈ (0, M 1 ), ψ is not continuous on [0, M 1 ], but still satisfies the conditions of the lemma.In this case the solution has a jump at a, as is shown in Figure 10 for a = 10.20] and F E is uniform on [1,30].
This corresponds to the functional where the function g a is given by Note that g a is continuous, a fact that is absolutely essential in the smooth functional theory.
Integration by parts gives: Hence we get generally: where Fn is the nonparametric MLE and F 0 the underlying distribution function, and in showing things like we need a functional of this type in the proof of the limit result for the MLE in the incubation time model.9.2.Properties of the solution of the integral equation (9.2).The integral equation (9.2) has the same structure as the integral equations for the interval censoring, case 2, model.This equation is given by (10.33) on p. 292 of [9].
Let, for F ∈ F , the function d F be defined by We have: e −1 dF E (e) > 0, which follows from our condition (F1) on is implied by condition (9.3).This means that we can prove that soluions of the integral equation are uniformly bounded for F ∈ F .We get the folloiwing result.

PROOF. The equation can be written
1 {e<t} dF E (e).(9.9)where d F is defined by (9.6).Since d F (t) −1 ψ(t) is clearly uniformly bounded, using (9.7), (9.8) and the boundedness of ψ, we only have to consider the second term on the right-hand side of (9.9).But for this term we use the same type of argument again that we used in the proof of Lemma 9.1.If the solution ϕ F attains its supremum at s ∈ [0, M 1 ], this term is nonpositive at s. Otherwise we take the limit from the left.So the upper bound is bounded by or its limit from the left at s if the supremum is not attained.
The same type of argument applies to the lower bound.
Analogously to the interval censoring model, we have to consider the function ξ F , defined by where ϕ F solves (9.2) and where we define 0/0 = 0.The function ξ F satisfies a similar integral equation as the function ϕ F .This is seen in the following way.
We have: So we find, wrting ϕ = ϕ F and ξ = ξ F : In a similar way we find: So we find that ξ F solves the equation in ξ, where c(t) is defined by: So the equations have the same properties as the integral equations in the interval censoring, case 2, model and we get in particular the following lemma (Lemma 10.5 on p. 297 of [9]).LEMMA 9.3.Let F be the class of distribution functions on [0, M 1 ] that are either absolutely continuous, with a continuous density staying away from zero on [0, M 1 ], or a piecewise constant distribution function with a finite number of jumps, satisfying assumption (F2).Let ψ be continuous, with a bounded derivative except at an at most countable number of points, where right and left limits exist.Then: (i) The derivative of ϕ F at the points of continuity is bounded, uniformly over F ∈ F and the points of continuity, implying if y and x are in the same interval between jumps.K 1 is independent of F and x.The same holds when ϕ F is replaced by with K 2 > 0 and K 3 > 0 independent of x and F .
The proof follows the steps of the proof of Lemma 10.5 on p. 297 of [9] and is therefore omitted.Note however, that we have to extend this lemma, because we want to treat discontinuous functions ψ such as 1 [0,a] on the right-hand side of the integral equation.
We will need the following extension, analogous to Corollary 4.2 in [4].
COROLLARY 9.1.Let F be the class of distribution functions on [0, M 1 ] that are either absolutely continuous, with a continuous density staying away from zero on (0, M 1 ), or a piecewise constant distribution function with a finite number of jumps, satisfying assumption (F2).Let ψ be a bounded right-continuous function with art most a finite set of discontinuities D, with a bounded derivative except at an at most finite number of points, where right and left limits exist.Then: (i) The derivative of ϕ F at the points of continuity is bounded, uniformly over F ∈ F and the points of continuity, implying if y and x are in the same interval between jumps.K 1 is independent of F and x.The same holds when ϕ F is replaced by ξ F .(ii) At the discontinuity points x of F or ψ: where K 2 and K 3 are positive constants independent of x and F .
The proof follows from the preceding results in the same way as the proofs of Lemma 4.3 and Corollary 4.2 in [4] follows from the correponding results on the integral equation there.9.3.Proof of Theorem 4. A fundamental tool in our proof is the so-called "switch relation", see, e.g., Section 3.8 in [9].We define, for a ∈ (0, 1) where V n is defined by (3.7) and G n = G n, Fn is defined by (3.6) for F = Fn .Then we have the so-called switch relation: see, e.g., (3.35) and Figure 3.7 in Section 3.8 of [9].
We have: where a 0 = F 0 (t 0 ).Using the property that the argmin function does not change if we add constants to the object function, we get: where REMARK 9.1.Note that by the assumptions on F E and F 0 we may assume that Fn (s) − Fn (s − e) stays away from zero for n sufficiently large if e ≥ ε and s − e ≥ 0.
Let m = max j (S j − E j ) + , and let X n be defined as in Lemma 9.4 below.Then, letting dQ 0 (e, s) dQ 0 (e, s) By a change of variables, we can write the last expression in the form: For future reference, we define We have the following lemma.LEMMA 9.4.Let, under the conditions of Theorem 4, the process X n be defined by: where δ 1 = 1 {s≤e} , δ 2 = 1 {e<s≤M1} and δ 3 = 1 − δ 1 − δ 2 .Let t 0 be an interior point of the support of f 0 .Then n 2/3 X n converges in distribution, in the Skorohod topology, to the process where c E and W are the same as in Theorem 4.
PROOF OF LEMMA 9.4.This follows from the convergence to the same limit process of and the consistency of Fn together with the entropy with bracketing for the L 2 -norm of the functions for M > 0 and distribution functions F such that {F (s) − F (s − e)}1 {e≥ε} stays away from zero for s in the relevant interval (see, e.g., p. 59 of [9]).Note that we get for the variance of (9.11): 2 dQ 0 (e, s) 2 dQ 0 (e, s) We will also need the following rate result for the L 2 -distance.PROOF.We define the (convolution) density q F by (1.1) and follow the exposition in [19], Example 7.4.4.The condition that the exposure time E stays away from zero is comparable to the condition (7.41) on p. 116 of [19] that the intervals [U, V ] have a length which stays away from zero.In this case we find that the squared Hellinger distance satisfies: ds dF E (e) = O p n −2/3 , see (7.42) in [19].
We can write the next to last term above in the form (using Fubini's theorem and a change of variables): So we get in particular Fn (s) − F 0 (s) by our assumptions on F E .The relation now gives the result.
The following lemma.corresponding to Lemma 4,4 on p. 146 of [4] is crucial in our proof.
LEMMA 9.6.Let the conditions of Theorem 4 be satisfied and let Fn be the MLE.Then: PROOF.Let ϕ t, Fn be the solution of the integral eqaution see (9.2) above, and let θ t, Fn be defined by where A picture of such a ϕ t, Fn is given in Figure 11.Also note that for the argument x + e > M 1 we use The argument now follows the reasoning of the proof of Lemma 4.4 in [4], where we use the properties of the solution of the integral equations, discussed in Section 9.1 above and Lemma 9.5.For example, we change the function ϕ t, Fn to a piecewise constant version φt, Fn , piecewise constant on the same intervals as Fn , except possibly the interval containing t, for example using (4.37) on p. 146 of [4], and define where δ 1 = 1 {s≤e} , δ 2 = 1 {e<s≤M1} and δ 3 = 1 − δ 1 − δ 2 .Then, using Lemma 9.5 above, we find: Next we get: which gives the desired result.
COROLLARY 9.2.Let the conditions of Theorem 4 be satisfied and let Fn be the MLE.Moreover, let G be a set of right-continuous function with left limits g : [0, M 1 ] → R which are of uniformly bounded variation.Then: The proof of this corollary follows in the same way as the proof of Corollary 4.3 in [4].
The following rough upper bound will also be useful.
LEMMA 9.7.Let the conditions of Theorem 4 be satisfied.Then PROOF.The proof is analogous to the proof of Corollary 4.4 in [4].
LEMMA 9.8.Let the conditions of Theorem 4 be satisfied, and let the proces Y n be defined by (9.10).Then, for arbitrary M > 0 and t ∈ [−M, M ]: PROOF.Let Y n be defined by (9.10).We can write: We also have: tools of that type only to show that B n (t) is of order o p (n −2/3 ).One could say that this is the heart of the difficulty of the proof.We'll use the following lemma.
LEMMA 9.9.Let the conditions of Theorem 4 be satisfied, and let B n (t) be defined by (9.17) in Lemma 9.8.Then, for arbitrary M > 0 and t ∈ [−M, M ]: PROOF.Using Lemma 9.7 again, is is sufficient to show Bn (t) = O p (n −5/6 ), where Bn (t) is defined by Let t 0 > ε, where ε > 0 is defined as in the conditions of Theorem 4. We can write uniformly for s ∈ [t 0 , t 0 + n −1/3 t) by Corollary 9.2 and the continuity of the function for s ∈ [t 0 , t 0 + n −1/3 t), if u stays away from s. So: If t 0 ≤ ε, the integration interval for e is either empty or of order n −1//3 , which yields, using sup We finally need the following "tightness" lemma, which follows from the negligibility of B n (t) in (9.16) of Lemma 9.8, which, in turn, follows from Lemma 9.9.LEMMA 9.10.Let the conditions of Theorem 4 be satisfied and let a 0 ∈ (0, 1).Then, for each δ > 0 and K 1 > 0 a K 2 > 0 can be found such that for all large n.
We now have: which converges in distribution to the argmin of the process where W is two-sided Brownian motion, originating from zero.Theorem 4 now follows from Brownian scaling.So we get, using integration by parts, if ϕ = ϕ t, Fn solves (9.20) for F = Fn : Let θ t, Fn be defined by and ϕ n,t,F0 solve the integral equation (6.5).The asymptotic variance σ 2 is therefore given by lim n→∞ n −1/5 ∥θ n,t,F0 ∥ 2 Q0 .
PROOF.We consider the "mean functional": x dF (x) The score operator (see section 9.6) is of the form where N (0, σ 2 ) is a normal distribution with mean zero and variance In fact, using ϕ(M 1 ) = a(x) dF (x) = 0, we get: ϕ F0 (u) du.9.6.Score operators and adjoint equations.We need the concept of Hellinger differentiability.Let the unknown distribution P on (Y, B) be contained in some class of probability measures P, which is dominated by a σ-finite measure µ.Let P have density p with respect to µ.We are interested in estimating some real-valued function Θ(P ) of P .
Let, for some δ > 0, the collection {P t } with t ∈ (0, δ) be a 1-dimensional parametric submodel which is smooth in the following sense: Therefore, a is also called the score function or score.The collection of scores a obtained by considering all possible one-dimensional Hellinger differentiable parametric submodels, is a linear space, the tangent space at P , denoted by T (P ).
In the models for inverse problems, to be considered here, we work with a so-called hidden space and an observation space.All Hellinger differentiable submodels that can be formed in the observation space, together with the corresponding score functions, are induced by the Hellinger differentiable paths of densities on the hidden space, according to the following theorem: THEOREM 9.1.Let P ≪ µ be a class of probability measures on the hidden space (Y, B).P ∈ P is induced by the random vector Y .Suppose that the path {P t } to P satisfies for some a ∈ L 0 2 (P ), where the superscript 0 means that a dP = 0. Let S : (Y, B) → (Z, C) be a measurable mapping.Suppose that the induced measures Q t = P t S −1 and Q = P S −1 on (Z, C) are absolutely continuous with respect to µS −1 , with densities q t and q.Then the path {Q t } is also Hellinger differentiable, satisfying t −1 ( √ q t − √ q) − 1 2 a √ q 2 dµS −1 → 0 as t ↓ 0 with a(z) = E P (a(Y )|S = z).
For a proof, see [2].Note that a ∈ L 0 2 (Q).The relation between the scores a in the hidden tangent space T (P ) and the induced scores a is expressed by the mapping A : a(•) → E P (a(Y )|S = •).(9.27)This mapping is called the score operator.It is continuous and linear.Its range is the induced tangent space, which is contained in L 0 2 (Q).Now Θ : P → R is pathwise differentiable at P if for each Hellinger differentiable path {P t }, with corresponding score a, we have Fig 1:The point process {(T i , W n, Fn (T i )), i = 1, 2, . . .} for points T i , running through points S i and S i − E i between the minimum of the S i and the maximum of the S i − E i .Sample size n = 100.The data correspond to a truncated Weibull distribution for the incubation time distribution, used in simulations of the incubation time distribution.The points are connected by line segments.

Fig 2 :
Fig 2: The cusum diagram {(G n (T i ), V n (T i )), i = 1, 2, . . .},where T i runs through the ordered points (S i − E i ) + and S i and V n is defined by (2.10), together with its greatest convex minorant (red curve).Sample size n = 100.

Fig 3 :
Fig 3: The nonparametric MLE Fn of the incubation time distribution function (blue), and the MLE using the Weibull distribution (red, dashed), for the 88 Wuhan travelers.

Fig 4 :
Fig 4: Smoothed bootstrap onfidence intervals for the Wuhan data set, based on the SMLE.The red curve is the SMLE Fn,h , where h = 6.21628.

Fig 5 :
Fig 5: The bootstrap MSE for the Wuhan data.

Fig 6 :
Fig 6: Box plot of 95th percentile estimates for the nonparametric, Weibull and log-normal maximum likelihood estimators for 1000 samples of size n = 500.The incubation time data are generated from a Weibull distribution.The red line denotes the value of the true percentile.

Fig 7 :
Fig 7: Box plot of 95th percentile estimates for the nonparametric, Weibull and log-normal maximum likelihood estimators for 1000 samples of size n = 1000.The incubation time data are generated from a Weibull distribution.The red line denotes the value of the true percentile.

Fig 8 :
Fig 8: Box plot of estimation of the first moment of the incubation distribution for the nonparametric, Weibull and log-normal maximum likelihood estimators for 1000 samples of size n = 5000.The incubation time data are generated from a Weibull distribution.The red line denotes the value of the actual real first moment.

LEMMA 9 . 2 .
Let F E be a distribution function on [0, M 2 ], satisfying condition (F1) and let the class of distribution function F be defined by (F2).Let ψ : [0, M 1 ] → R be a bounded right-continuous function with left-limits (cadlag) on [0, M 1 ].Then the solutions ϕ F of equation (9.2) are uniformly bounded in F ∈ F .

TABLE 1 n
[1,times the variances of Fn(6) for 1000 simulations for the model, where F E is uniform on[1, 30]and F 0 is a truncated Weibull distribution on [0, 20].The limit value of Theorem 4.1 is denoted.by ∞. n 2/3 • variance 0.38990 0.36071 0.32329 0.28816 0.27188 0.27489 dµ → 0 as t ↓ 0, for some a ∈ L 2 (P ) Such a submodel is called Hellinger differentiable.This property can be seen as an L 2 version of the pointwise differentiability of log p t (x) at t = 0 (with p 0 = p), with the function a playing the role of the so-called score-function ∂ ∂t log p t (•) t=0 in classical statistics.For we have,