## Electronic Journal of Statistics

### Nonparametric estimation of the lifetime and disease onset distributions for a survival-sacrifice model

#### Abstract

In carcinogenicity experiments with animals where the tumor is not palpable it is common to observe only the time of death of the animal, the cause of death (the tumor or another independent cause, as sacrifice) and whether the tumor was present at the time of death. These last two indicator variables are evaluated after an autopsy. Defining the non-negative variables $T_{1}$ (time of tumor onset), $T_{2}$ (time of death from the tumor) and $C$ (time of death from an unrelated cause), we observe $(Y,\Delta_{1},\Delta_{2})$, where $Y=\min\left\{T_{2},C\right\}$, $\Delta_{1}=1_{\left\{T_{1}\leq C\right\}}$, and $\Delta_{2}=1_{\left\{T_{2}\leq C\right\}}$. The random variables $T_{1}$ and $T_{2}$ are independent of $C$ and have a joint distribution such that $P(T_{1}\leq T_{2})=1$. Some authors call this model a “survival-sacrifice model”.

[20] (generally to be denoted by LJP (1997)) proposed a Weighted Least Squares estimator for $F_{1}$ (the marginal distribution function of $T_{1}$), using the Kaplan-Meier estimator of $F_{2}$ (the marginal distribution function of $T_{2}$). The authors claimed that their estimator is more efficient than the MLE (maximum likelihood estimator) of $F_{1}$ and that the Kaplan-Meier estimator is more efficient than the MLE of $F_{2}$. However, we show that the MLE of $F_{1}$ was not computed correctly, and that the (claimed) MLE estimate of $F_{1}$ is even undefined in the case of active constraints.

In our simulation study we used a primal-dual interior point algorithm to obtain the true MLE of $F_{1}$. The results showed a better performance of the MLE of $F_{1}$ over the weighted least squares estimator in LJP (1997) for points where $F_{1}$ is close to $F_{2}$. Moreover, application to the model, used in the simulation study of LJP (1997), showed smaller variances of the MLE estimators of the first and second moments for both $F_{1}$ and $F_{2}$, and sample sizes from 100 up to 5000, in comparison to the estimates, based on the weighted least squares estimator for $F_{1}$, proposed in LJP (1997), and the Kaplan-Meier estimator for $F_{2}$. R scripts are provided for computing the estimates either with the primal-dual interior point method or by the EM algorithm.

In spite of the long history of the model in the biometrics literature (since about 1982), basic properties of the real maximum likelihood estimator (MLE) were still unknown. We give necessary and sufficient conditions for the MLE (Theorem 3.1), as an element of a cone, where the number of generators of the cone increases quadratically with sample size. From this and a self-consistency equation, turned into a Volterra integral equation, we derive the consistency of the MLE (Theorem 4.1). We conjecture that (under some natural conditions) one can extend the methods, used to prove consistency, to proving that the MLE is $\sqrt{n}$ consistent for $F_{2}$ and cube root $n$ convergent for $F_{1}$, but this has presently not yet been proved.

#### Article information

Source
Electron. J. Statist., Volume 13, Number 2 (2019), 3195-3242.

Dates
First available in Project Euclid: 24 September 2019

https://projecteuclid.org/euclid.ejs/1569290687

Digital Object Identifier
doi:10.1214/19-EJS1598

Subjects
Primary: 62G09: Resampling methods 62N01: Censored data models

#### Citation

Gomes, Antonio Eduardo; Groeneboom, Piet; Wellner, Jon A. Nonparametric estimation of the lifetime and disease onset distributions for a survival-sacrifice model. Electron. J. Statist. 13 (2019), no. 2, 3195--3242. doi:10.1214/19-EJS1598. https://projecteuclid.org/euclid.ejs/1569290687

#### References

• [1] Michael G. Akritas. The central limit theorem under censoring., Bernoulli, 6(6) :1109–1120, 2000. ISSN 1350-7265. URL https://doi.org/10.2307/3318473.
• [2] R. E. Barlow, D. J. Bartholomew, J. M. Bremner, and H. D. Brunk., Statistical inference under order restrictions. The theory and application of isotonic regression. John Wiley & Sons, London-New York-Sydney, 1972. Wiley Series in Probability and Mathematical Statistics.
• [3] P. J. Bickel, C. A. J. Klaassen, Y. Ritov, and J. A. Wellner., Efficient and adaptive estimation for semiparametric models. Springer-Verlag, New York, 1998. ISBN 0-387-98473-9. Reprint of the 1993 original.
• [4] G. E. Dinse and S. W. Lagakos. Nonparametric estimation of lifetime and disease onset distributions from incomplete observations., Biometrics, 38:921–932, 1982.
• [5] Richard Gill. Large sample behaviour of the product-limit estimator on the whole line., Ann. Statist., 11(1):49–58, 1983. ISSN 0090-5364. URL https://doi.org/10.1214/aos/1176346055.
• [6] A. E. Gomes. Consistency of the non-parametric maximum pseudo-likelihood estimator of the disease onset distribution function for a survival-sacrifice model., J. Nonparametr. Stat., 20(1):39–46, 2008. ISSN 1048-5252. URL https://doi.org/10.1080/10485250701830121.
• [7] Antonio Eduardo Gomes. Asymptotics for a weighted least squares estimator of the disease onset distribution function for a survival-sacrifice model., Ann. Inst. Statist. Math., 56(4):683–700, 2004. ISSN 0020-3157. URL https://doi.org/10.1007/BF02506483.
• [8] P. Groeneboom. Lectures on inverse problems. In, Lectures on probability theory and statistics (Saint-Flour, 1994), volume 1648 of Lecture Notes in Math., pages 67–164. Springer, Berlin, 1996. URL http://dx.doi.org/10.1007/BFb0095675.
• [9] P. Groeneboom and J. A. Wellner., Information bounds and nonparametric maximum likelihood estimation, volume 19 of DMV Seminar. Birkhäuser Verlag, Basel, 1992. ISBN 3-7643-2794-4.
• [10] Piet Groeneboom. R scripts for the survival-sacrifice model., https://github.com/pietg/survival-sacrifice-model, 2018.
• [11] Piet Groeneboom and Geurt Jongbloed., Nonparametric estimation under shape constraints, volume 38 of Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, New York, 2014. ISBN 978-0-521-86401-5. URL https://doi.org/10.1017/CBO9781139020893. Estimators, algorithms and asymptotics.
• [12] M. G. Gu and C.-H. Zhang. Asymptotic properties of self-consistent estimators based on doubly censored data., Ann. Statist., 21(2):611–624, 1993. ISSN 0090-5364. URL https://doi.org/10.1214/aos/1176349140.
• [13] J. Huang and J. A. Wellner. Asymptotic normality of the NPMLE of linear functionals for interval censored data, case 1., Statist. Neerlandica, 49:153–163, 1995. ISSN 0039-0402. URL http://dx.doi.org/10.1111/j.1467-9574.1995.tb01462.x.
• [14] R. Kodell, G. Shaw, and A. Johnson. Nonparametric joint estimators for disease resistance and survival functions in survival/sacrifice experiments., Biometrics, 38:43–58, 1982.
• [15] Johan Lim, Seung Jean Kim, and Xinlei Wang. Estimating stochastically ordered survival functions via geometric programming., J. Comput. Graph. Statist., 18(4):978–994, 2009. ISSN 1061-8600. URL https://doi.org/10.1198/jcgs.2009.06140.
• [16] David G. Luenberger., Optimization by vector space methods. John Wiley & Sons, Inc., New York-London-Sydney, 1969.
• [17] Lu Mao. Nonparametric identification and estimation of current status data in the presence of death., Statistica Neerlandica, 73:to appear, 2019.
• [18] Winfried Stute. The central limit theorem under random censorship., Ann. Statist., 23(2):422–439, 1995. ISSN 0090-5364. URL https://doi.org/10.1214/aos/1176324528.
• [19] B. W. Turnbull and T. J. Mitchell. Nonparametric estimation of the distribution of time to onset for specific diseases in survival/sacrifice experiments., Biometrics, 40(1):41–50, 1984. URL https://www.jstor.org/stable/2530742.
• [20] M. J. van der Laan, N. P. Jewell, and D. Peterson. Efficient estimation of the lifetime and disease onset distribution., Biometrika, 84:539–554, 1997.
• [21] A. W. van der Vaart. On differentiable functionals., Ann. Statist., 19:178–204, 1991. ISSN 0090-5364. URL http://dx.doi.org/10.1214/aos/1176347976.
• [22] S. J. Wright., Primal-dual interior-point methods, volume 54. SIAM, 1997.