Censored count data regression with missing censoring information

: We investigate estimation in Poisson regression model when the count response is right-censored and the censoring indicators are missing at random. We propose several estimators based on the regression calibration, multiple imputation and augmented inverse probability weighting methods. Under appropriate regularity conditions, we prove the consistency of our estimators and we derive their asymptotic distributions. Simulation experiments are carried out to investigate the ﬁnite sample behaviour and relative performance of the proposed estimates. These estimates are illustrated on a real data set.


Introduction
Poisson regression is a popular tool for modeling the relationship between a count response (such as the number of cases of a specific disease in epidemiology, or the number of insurance claims within a given period of time) and a set of predictors or covariates. Over the past years, Poisson regression has been extended to accommodate censored count data. Although censoring is usually associated to lifetime data analysis, count data can also be censored, the most common type being right-censoring, which occurs when it is only known that the true count is higher than the observed one. For example, consider a study investigating the smoking habits of some population, where people report their number of cigarettes smoked per day. If one possible answer is "20 cigarettes or more", all cigarettes counts greater than 20 are right-censored at 20. Ignoring censoring is known to yield biased estimates and thus, incorrect inferences. Statistical inference in censored Poisson regression and extensions was therefore addressed by several authors; see, for example, [39], [8], [11], [50], [26] for censored generalized Poisson regression, [22] for finite mixtures of censored Poisson regressions and [33], [29] for zero-inflated censored Poisson regression.
Censored models for count data can be conveniently specified by introducing a censoring indicator which is set to 1 if the observed count is not censored and 0 otherwise. In this paper, we consider the situation where the censoring indicator is missing for some sample individuals. In the context of survival analysis, this issue has been considered by several authors. For example, [24] and [36,37] address estimation of the survival function of a random survival time with missing censoring indicators. [27], [9] and [5] consider estimation in the proportional hazards and additive hazard regression models with missing censoring indicators. [45], [46] and [6] propose various nonparametric estimates of the hazard and conditional hazard functions with censoring indicators missing at random. [44] and [52] estimate the linear regression and partially linear single-index models for survival data with missing censoring indicators. A similar issue arises in competing risks data analysis with missing cause of failure, see for example [2,3,51,28].
Estimation in censored Poisson regression with missing censoring information is still an open problem. Our aim in this paper is to provide and compare several estimates adapted to this setting.
Missing data problems have given rise to a rich literature and several adapted estimation methods have been proposed. A common and simple approach, called complete-case analysis, is to exclude individuals with missing data. This method can induce bias and substantial variance increase. Two alternatives are regression calibration and multiple imputation. Regression calibration is a general method for handling missing or mismeasured variables. It consists in replacing missing data by their conditional expectation given observed data. We refer to [7] for a detailed account of this method, which has been used in a variety of contexts, including the linear regression model [19], proportional hazards regression model for survival data [43,10,25], generalized linear models [16,47]. In multiple imputation, missing data are replaced by data generated from an imputation model. This imputation is repeated M times, generating M completed data sets. Each of them is analysed and an overall estimator is obtained by combining the estimates of the M completed samples. Multiple imputation was also used in a number of settings, including linear regression [17], generalized linear models [21], proportional hazards regression [49,20] and count data models [23]. Both regression calibration and multiple imputation require a model for the missing data given the observed data. Inverse probability weighting constitutes another alternative method for dealing with missing data (see for example [34] for a review of this method). Similarly to the complete-case analysis, inverse probability weighting only uses complete cases, but weights are used to rebalance the set of complete cases. Calculating these weights requires a model for the probability that an individual has complete data. Augmented inverse probability weighting was then proposed to ensure robustness against misspecification of the missingness model (see, for example, [40] for a detailed account on the method).
In this paper, we investigate, both theoretically and numerically, the regression calibration, multiple imputation and augmented inverse probability weighting estimators of the regression parameter in the censored Poisson regression model with missing censoring indicators. Our analysis of these estimates will be based on parametric assumptions for the conditional models for missing data and the missingness mechanism. The plan of the paper is as follows. In Section 2, we describe the model setup and we introduce the notations that will be used throughout the paper. In Section 3, we introduce our regression calibration estimator and we establish its consistency and asymptotic normality. In Sections 4 and 5, we propose our multiple imputation and augmented inverse probability weighted estimators, and we derive their asymptotic properties. All our theoretical derivations are based on an incomplete gamma function formulation of the distribution function of the Poisson regression model. Consistent asymptotic variance estimates are also proposed for the regression calibration, multiple imputation and augmented inverse probability weighted estimators. In Section 6, we conduct a simulation study to assess the finite sample performance and robustness to parametric assumptions of the proposed estimates. We also illustrate the proposed estimates on a real data set. Discussion and perspectives are given in Section 7. All proofs are deferred to appendices.

Model, data, notations
Let Y denote the count of interest and X = (1, X 2 , . . . , X p ) be a p-vector of covariates ( denotes the transpose operator). We assume that the conditional distribution of Y given X is given by a Poisson regression model with parameter λ = exp(β X), where β ∈ R p is a vector of unknown parameters.
We consider the situation where Y can be right-censored, that is, instead of the true Y , we eventually observe a value which is smaller than Y . This can be formalised by introducing a finite random variable C such that we observe either Y if Y < C or C if Y ≥ C, and an indicator δ (called censoring indicator thereafter) which is equal to 1 if Y < C and 0 if Y ≥ C. In what follows, we assume that Y and C are independent conditionally on X and that the distribution of C does not depend on β. These conditions are reminiscent of survival analysis, where they are called the independent censoring and noninformative censoring hypotheses respectively. These hypotheses are reasonable if censoring is due to an external event (i.e., the value of C is not directly driven by the value of Y ) or is fixed by design. Otherwise, one needs to model the joint distribution of (Y, C). We denote by Y * the observed count value (that is, Assume that n independent individuals are available and that for each of them, we observe the triplet ( . . , n}). Under the above hypotheses, the likelihood of β is calculated as: from which we easily deduce the loglikelihood n (β) = log L n (β): Standard asymptotic theory implies that the maximum likelihood estimator β n = arg max β n (β) is consistent and asymptotically normal with variance −E[∂ 2 1 (β)/∂β∂β ]. Remark. The censoring variable C does not have to be a count. For example, if Y i = 4 and C i = 2.5, then Y * i = 2.5 (and δ i = 0) and the contribution of the denotes the smallest integer not less than Y * i . If C is continuous, contributions of uncensored observations are unchanged since these observations are integer values.
Overall, the loglikelihood (2.1) remains valid when C is continuous, with the appropriate change of notation P(Y i ≥ Y * i |X i ) for censored observations. In order to keep notations simple, and without loss of generality, we assume that C is discrete (as is also the case in most applications). Now, we consider the situation where some additional uncertainty can arise in the observations. Precisely, we consider the situation where the censoring indicator δ i is missing for some individuals. Let ξ be a missingness indicator, that is, ξ = 1 if δ is observed and ξ = 0 otherwise. Then, for individual i ∈ {1, . . . , n}, the observed data are We consider a missing at random (MAR) mechanism, which means that ξ and δ are independent given all other observed variables (a more restrictive assumption is that ξ and δ are independent, which is called "missing completely at random"). In the next sections, we propose, investigate and compare several estimators of β in this context.

The proposed estimator
Our first estimator is based on the regression calibration idea. It consists in replacing any missing δ i in (2.1) by its conditional expectation E(δ i |W i ), where W i contains the observed variables Y * i and X i and eventually (if available) some observed surrogate variables V i for δ i . Thus, we let W i = (Y * i , X i , V i ) (we denote by q the dimension of W i ). An approximated version of δ i can then be defined as:δ The conditional expectation E(δ i |W i ) (or conditional probability P(δ i = 1|W i )) will generally be unknown and will have to be estimated. As is usual with the regression calibration approach, we assume that E(δ i |W i ) can be specified by a parametric model m(W i , θ), where θ is an unknown q-dimensional parameter with true value θ 0 .

Remark.
A convenient candidate for m(·, ·) is the logistic regression model m(W i , θ) = logit −1 (θ W i ) but other choices, such as the probit, are possible. One may also allow for polynomial, spline and interaction terms in these models, in order to make them as flexible as desired. In what follows, we assume a general model m(W i , θ) with some regularity conditions stated in section 3.2.
At a first stage, we estimate θ 0 by maximizing a likelihood based on complete cases i ∈ {1, . . . , n|ξ i = 1} only: where for any column vector u, u ⊗2 = uu . Then it is rather straightforward to see thatθ n is asymptotically linear with influence function Finally, it will be useful to note that if Y is distributed as Poisson with parameter λ, then for any u ∈ N, , we define our regression calibration estimator of β as is an approximated version of (2.1).

Regularity conditions and asymptotic results
The following regularity conditions are needed to establish the asymptotic properties of the regression calibration estimator. We assume:

C1
The covariates vectors X i and V i are bounded, for every i = 1, 2, . . . C2 The true parameter values β 0 and θ 0 lie in the interior of some bounded sets B ⊂ R p and Θ ⊂ R q respectively. C3 We have P(Y * ≥ 1|ξδ = 0) = 1 and P(δ = 1) > 0. C4 The function m(w, θ) is differentiable with respect to θ, for every w. For Remark. Condition C3 requires that a minimum amount of information is available on the count response when it is either censored (δ = 0) or its censoring status is unknown (ξ = 0). Intuitively, the observation {Y * = 0} carries no information if it is unknown that δ = 1 (i.e., that it is a "genuine" zero count), since all counts are non-negative.
Before stating the asymptotics ofβ n , we introduce some further notations. Let h β be the function defined by: for any β ∈ R p , x ∈ R p and y ∈ N\{0}. Let also π(W) = P(ξ = 1|W) and define the matrices We are now in position to state our first theorem. The proof is given in Appendix A.

Remark.
If π(W) is identically equal to 1 (that is, if there is no missing data), Σ reduces to the asymptotic variance of the maximum likelihood estimatorβ n in (2.1), which in turn reduces to the usual asymptotic variance (E[XX e β 0 X ]) −1 in Poisson regression if m(W, θ 0 ) is identically equal to 1 (that is, no censoring can affect the data).
A consistent estimator of Σ is given by The consistency proof of the variance estimator uses similar arguments as the proof of consistency ofβ n , it is thus omitted. The estimatorβ n will be evaluated in the simulation study of Section 6. Several methods have been proposed to address missing data problems in regression. Among them is the multiple imputation, which provides an alternative, popular and widely-used approach. The basic idea is to create several (say M ) completed data sets, by filling in plausible values for the missing data. Then, each filled sample is analysed as if it were the complete data set. Finally, the M imputed-samples inferences are combined into a single overall inference. In the next section, we investigate this approach for estimating β in our problem.

Multiple imputation
In this section, we assume, as in Section 3, that the conditional expectation E(δ i |W i ) can be specified by a parametric model m(W i , θ 0 ), and we denote byθ n the maximum likelihood estimator of θ 0 based on the complete cases The imputation procedure is as follows. Each missing δ i is replaced by a random draw from the Bernoulli distribution B(m(W i ,θ n )). We obtain a completed data set. This procedure is repeated M times to form M imputed data sets. For a given θ, let D i,j (θ) ∼ B(m(W i , θ)) denote the imputation of δ i in the j-th completed data set (j = 1, . . . , M). Let also be the random variable which is equal to δ i is ξ i = 1 (that is, if δ i is observed) and to D i,j (θ) if ξ i = 0 (that is, if δ i is missing) (note the difference between the imputation method, where δ * i,j (θ) ∈ {0, 1}, and the regression calibration approach, whereδ i (θ) ∈ [0, 1]). A single-imputation estimatorβ * n,j of β 0 is obtained by maximizing the imputed log-likelihood * The final multiple imputation estimatorβ * n is obtained by averaging the M estimatorsβ * n,j as:β * The next theorem gives the asymptotic properties ofβ * n . Its proof is given in Appendix B.
A consistent estimator of Σ * can be obtained as and Σ 2,n , Σ 3,n and Θ n are as given in Section 3. Regression calibration and multiple imputation rely on the ability of the investigator to formulate an appropriate model for E(δ|W). Misspecifying this model is likely to yield biased estimates of the parameters of interest. An alternative approach is to specify the selection probabilities π(W i ) = P(ξ i = 1|W i ) and to use the inverse probability weighting (IPW) of complete-case technique of [18]. The basic idea of IPW is to adjust a complete-case analysis by weighting individuals with no missing data by the inverse of their selection probability. Selection probabilities are generally unknown and have to be estimated. Again, misspecifying the π(W i ), i = 1, . . . , n is likely to yield biased inference. Moreover, by discarding individuals with missing data, IPW is also known to yield loss of efficiency.
For these reasons, the augmented IPW approach [AIPW henceforth, see 32] was proposed to improve the basic IPW. Since its introduction, the method has been shown to be doubly robust in several models, such as the proportional hazards model [42], the single-index model [14], the additive hazards model [38] and the accelerated failure time model [35]. Double robustness refers to the fact that the AIPW estimates are consistent as long as either the selection probability model or the conditional expectation of the missing data is correctly specified. In the next section, we propose an augmented IPW estimating equation adapted to our problem, and we investigate the asymptotic properties of the resulting estimator.

Augmented inverse probability weighted estimation
Inspired by [18], the inverse probability weighting of complete cases has become a classical estimation method in missing data problems. One drawback of the method is that the observed variables of subjects with missing data are not fully used, except through the estimation of the unknown selection probabilities. The AIPW method improves IPW by introducing an additional term involving contributions from individuals with some missing data (we refer to [40] for a detailed account on the method and numerous references). Adapting this idea, we propose the following augmented IPW estimating equation for β: The quantities E(δ i |W i ) and π(W i ) are unknown and have to be estimated. We assume that they can be specified by some parametric models m(W i , θ) and π(W i , γ) respectively, where θ and γ are unknown q-dimensional parameters with true values θ 0 and γ 0 . Letθ n andγ n be the maximum likelihood estimates of θ 0 and γ 0 .θ n is given by (3.1). Similarly,γ n can be obtained aŝ Finally, our AIPW estimatorβ n of β solves the estimating equation˘ n (β,θ n , γ n ) = 0, wherȇ Before stating the asymptotic properties ofβ n , we introduce some further notations and regularity conditions. For any θ, γ ∈ R q , we let Assuming the parametric model π(W i , γ) for the selection probabilities, the maximum likelihood estimatorγ n is asymptotically linear with influence func- That is: If the models m(W i , θ) and π(W i , γ) are misspecified, then by [48], there exists θ * and γ * such thatθ n P → θ * andγ n P → γ * . Moreover, the asymptotic linear expansions forθ n andγ n are given by (3.2) and (5.1), with θ 0 and γ 0 replaced by θ * and γ * respectively. If the model m( Finally, let We assume the following additional regularity conditions:

C5
The parameter space for γ is a bounded set G ⊂ R q and the true parameter value γ 0 lies in the interior of G. C6 The function π(w, γ) is strictly greater than 0 for all value of w in the support of W and all γ ∈ G. C7 The function π(w, γ) is differentiable with respect to γ, for every w. For Conditions C5 and C7 for γ and π(·, ·) are similar to conditions C2 and C4 for θ and m(·, ·). We are now in position to state the asymptotic properties of our AIPW estimator of β. From this result, the proposed estimatorβ n is doubly robust, in the sense that it estimates consistently β 0 as long as one of m(W i , θ) and π(W i , γ) is correctly modeled.
Remark. The basic idea of regression calibration and multiple imputation is to replace a missing δ i by an approximation whose conditional expectation given observed variables is equal to Here is the intuition underlying the AIPW method, and the seemingly complicated expression ofδ i (θ, γ).
The proof of Theorem 5.1 is given in Appendix C. The next theorem describes the asymptotic distribution ofβ n . Its proof is given in Appendix D.
In order to estimate the asymptotic variance ofβ n , let: where Σ 3,n and Θ n are as given in Section 3. Then a consistent estimator of J is given by: The proof of consistency of J n is omitted.

Simulation design
In this section, we investigate the finite sample performance of the regression calibration (RC), multiple imputation (MI) and AIPW estimators. The simulation design is as follows. For each of n individuals, the count response Y is simulated from a Poisson regression model with parameter [2,5]. The censoring and missingness mechanisms are set to be logit(m(W, θ) where θ and γ are chosen to yield the desired fractions of censored and missing data. We run a series of experiments in order to assess the effect of the sample size, censoring rate (CR) and missing rate (MR) on estimation: • experiment 1: we take n = 250, CR = 20%, MR = 20%, • experiment 2: we take n = 500, CR = 20%, MR = 20%, • experiment 3: we take n = 500, CR = 20%, MR = 40%, • experiment 4: we take n = 500, CR = 40%, MR = 20%, We can assess the effect of the sample size by comparing results of experiments 1 and 2. Similarly, by comparing experiments 2 and 3 (respectively 2 and 4), we can assess the effect of the missing rate (respectively censoring rate). Within each experiment, we compare the RC, MI and AIPW estimates under three scenario: (i) only m(W, θ) is correctly modeled, (ii) only π(W, γ) is correctly modeled, (iii) both m(W, θ) and π(W, γ) are correctly modeled. In the first scenario, Our simulation results are based on N = 1000 simulated samples. For each estimator, we report the average bias, average standard error (SE), empirical root mean square error (RMSE) and empirical coverage probability (CP) of 95%level confidence intervals. MI estimates are obtained with M = 50 (from our numerical experiments, this is large enough to ensure stability of the estimates). To establish a benchmark for comparisons, we also include an estimator based on the full data set with no missing censoring indicators and the complete-case (CC) estimator which maximizes the log-likelihood (2.1) on the subsample of complete cases only. Results of experiment j are summarized in Table j, for j = 1, . . . , 4. All estimates are obtained using the Newton-Raphson algorithm, implemented in R [31].
Remark. The EM algorithm is a popular tool for calculating maximum likelihood estimates in missing data problems. In the context of Poisson regression with missing data, it has been used by several authors. For example, [12], [4] and [22] use EM in finite mixtures of Poisson, bivariate Poisson and censored Poisson regression models. In these works, the EM algorithm is motivated by the missing data formulation of mixture models, where the unknown mixture component indicator is treated as the missing data. EM was also used in zeroinflated Poisson regression [15]. Here, the missing data is the unobserved state variable (zero state vs Poisson state). [1] use EM in bivariate Poisson regression with missing outcome. The EM algorithm could also be used in our setting, and it would be interesting to investigate the convergence rate of the sequence of EM estimates. This, however, falls beyond the scope of our paper and constitutes a topic for future work.

Results
As expected, the performance of the estimators improve when sample size increases. In the first scenario of each experiment, the RC, MI and AIPW methods appear to have similar performance. Coverage probabilities are close to the nominal confidence level, indicating that the asymptotic variances are appropriately estimated.
In the second scenario, the AIPW method generally achieves the smallest SE and RMSE, while the bias of the RC and MI estimates increase substantially, resulting in coverage probabilities smaller than desired (this is particularly noticeable when the censoring rate is large, see Table 4). This result was expected since m(W, θ) is misspecified. On the other hand, when censoring is moderate (Table 1-Table 3), the bias of the AIPW estimate stays moderate and of the same order of magnitude (but generally slightly larger, see our explanation below) as in the first scenario, which is also expected due to the double robustness property stated in Theorem 5.1. When the censoring rate is high, the bias of the AIPW estimate is more important and coverage probabilities can be affected (but less than for the RC and MI methods). This suggests that in finite samples, the AIPW estimator is more sensitive to a misspecification of m(W, θ) than of π(W, γ). This can be explained by the expression ofδ i (θ, γ), which is equal θ) is wrong, every individual i contributes to the likelihood with a misspecified term, whatever ξ i is. On the other hand, when π(W, γ) is wrong, only individuals with ξ i = 1 contribute to the likelihood with a misspecified term, since π(W, γ) does not appear in the contribution of individuals with ξ i = 0. This unbalance may explain the greater sensitivity of the AIPW estimate to a misspecification of m(W, θ).
Finally, when both models are correct (third scenario), all three methods perform similarly (results for the RC and MI methods are the same as for the first scenario).
Overall, this simulation study confirms the theoretical results stated in the previous sections. The regression calibration, multiple imputation and robust IPW methods provide similar results when either m(W, θ) or both m(W, θ) and π(W, γ) are correctly specified. When m(W, θ) is misspecified, the AIPW approach performs better than RC and MI, in particular in terms of point estimation (with substantially smaller bias for AIPW). The CC estimates are outperformed by the three methods in all scenarios.

Asymptotic variance estimation
Estimation of Σ, Σ * and J (the asymptotic variances of the RC, MI and AIPW estimates respectively) is a crucial issue for statistical inference purpose. In this section, we investigate the accuracy of their respective estimates Σ n , Σ * n and J n . First, note that although Σ, Σ * and J have explicit expressions, they cannot be calculated analytically, due to the complex expectations involved in the Σ j , j = 1, . . . , 8. Therefore, we propose to compare Σ n (respectively Σ * n , J n ) to some "oracle" estimate Σ or (respectively Σ * ,or , J or ), which is obtained as follows: we simulate a very large number (here, 15000) of observations (Y * i , X i , δ i , ξ i ), and we calculate empirical versions of the Σ j where all expectations are replaced by sample averages and parameters are fixed at their true value (hence the name oracle). We expect these oracles to be as close as possible of the true unknown asymptotic variances. Comparisons between Σ n , Σ * n , J n and the oracles are based on the results of the above simulation study.
For each experiment and each of the RC, MI and AIPW method, we calculate the relative differences 100 × |Σ 1/2 n,(j,j) − (Σ or (j,j) ) 1/2 |/(Σ or (j,j) ) 1/2 , 100 × |(Σ * n,(j,j) ) 1/2 − (Σ * ,or (j,j) ) 1/2 |/(Σ * ,or (j,j) ) 1/2 and 100 × |J 1/2 n,(j,j) − (J or (j,j) ) 1/2 |/(J or (j,j) ) 1/2 (where A (j,j) is the j-th diagonal element of a matrix A) between the estimated and oracle standard deviations of β j (j = 1, . . . , 5) (we calculate the relative error for standard deviations rather than for variance, since standard deviations are used to obtain confidence intervals and Wald test statistics, which are the cornerstones of statistical inference in regression models). Results are averaged over the N simulated samples and reported in Table 5. We also report the RMSE of the RC, MI and AIPW variance estimates (the corresponding oracle variance estimates are used as reference). For example, the RMSE of the RC variance estimate of β j is calculated as where Σ ( ) n,(j,j) denotes the RC variance estimate of β j in the -th simulated sample. Results are given in Table 6. Regarding AIPW, we evaluate the three variance estimates given in (5.2) (results are reported at line "AIPW1" for misspecified π(W, γ), "AIPW2" for misspecified m(W, θ), "AIPW3" when both models are correct, in both Tables 5 and 6).
From these results, it appears that both RMSE and relative differences between estimated and oracle standard deviations decrease when sample size increases and censoring and missing rates decrease. Relative errors and RMSE are the smallest for the AIPW estimate when both m(W, θ) and π(W, γ) are correctly specified (line AIPW3). In each scenario, the relative errors and RMSE of AIPW are smaller when π(W, γ) is misspecified than when m(W, θ) is misspecified. In fact, relative errors and RMSE show little sensitivity to misspecification of π(W, γ) (lines AIPW1 and AIPW3 are close to each other). On the other hand, when m(W, θ) is misspecified and censoring is high, the AIPW relative error can be substantial (around 20%, yielding the low coverage probabilities -around 75% -reported for β 1 , β 4 , β 5 in Table 4). But overall, all variance estimates essentially provide low relative errors (most of them being less than 6%). When m(W, θ) is well specified and π(W, γ) is misspecified: i) the RC (respectively MI) variance estimate performs slightly better (respectively a little less well) than AIPW in terms of relative error, ii) AIPW variance estimate performs better than RC and MI in terms of RMSE. These observations suggest that the AIPW variance estimator is superior to RC and MI when both m(W, θ) and π(W, γ) are correct, and that AIPW and RC variance estimates have similar performance when π(W, γ) is misspecified.

A real data analysis
We apply the proposed estimates to a data set from a survey of daily fruits and vegetables intake. The data were collected by the Office for National Statistics (UK), as part of a larger opinion survey. Respondents were asked about their usual daily intake of fruits and vegetables. Precisely, we have the number of portions of fruits and vegetables eaten by each respondent the day before the survey, and we know whether this number coincides with the respondent's usual intake or whether it is less than the usual intake. In this latter case, the usual intake is right-censored. The total sample size is n = 928. The censoring information is missing for 228 respondents (that is, 24.6% of the sample) and 29.6% of the respondents with known censoring information have a rightcensored daily intake. Covariates are gender, age, marital status (married vs single/divorced/separated), educational level (with three levels: "General Certificate of Secondary Education (GCSE) or no qualification", "A-level or equivalent", "higher education") and a factor coding respondents appreciation of their daily intake of fruits and vegetables ("enough", "not enough", "more than enough"). We use logistic regression models (with covariates the number of portions reported by the respondents and the five variables mentioned above) for the conditional expectation of the censoring indicator and the selection probabilities. A forward-and-backward elimination strategy based on the AIC is used to select the final models. Finally, we calculate the RC, MI (with M = 50 completed data sets) and AIPW estimates in a Poisson regression model for the usual daily intake of fruits and vegetables.
Results are presented in Table 7 (in this table, gender =1 for male and 0 for female; single=1 for a single/divorced/separated respondent and 0 for a married respondent; GCSE/no qualif.=1 if the respondent has either no qualification or has obtained a GCSE, and 0 otherwise, A-level or equiv.=1 if the respondent has obtained a A-level or an equivalent diploma; more than enough=1 if the respondent considers that her/his daily intake of fruits and vegetables is more than enough and 0 otherwise, not enough=1 if the respondent considers that her/his daily intake is not enough and 0 otherwise).
All methods conclude that age has a significant effect on the daily intake of fruits and vegetables, with older people consuming more than younger ones. The gender effect is not significant (at level 5%) for the CC, RC and AIPW analysis but is significant for the MI method, with women consuming more fruits and vegetables than men. All methods find that being married is associated with increased fruits and vegetables intakes, while being single, separated or divorced is associated with lower consumption. For example, using the MI estimate, we find that on average, being single yields a (1 − e −0.0944 ) × 100 ≈ 9% decrease in the daily intake (holding fixed all the other effects). Our results also suggest that individuals with higher education have a higher consumption of fruits and vegetables than those with lower education (the reference level in Table 7 is "higher education"). The difference in daily intake between respondents with an A-level or equivalent diploma and respondents with a higher degree is not significant (although not being far from it, for all methods except the complete-case analysis) but there is a very significant difference between respondents with a GCSE or no qualification and those with a high degree. Finally, respondents who perceive their intake as more than enough (respectively not enough) indeed consume more (respectively less) fruits and vegetables, which may reflect the fact that respondents are well-informed on the usual recommandations about fruits and vegetables intake. Our results are coherent with the findings of previous stud- incorrect m(W, θ) / correct π(W, γ) both models correct estimator    Table 3. Simulation results for n = 500, censoring rate = 20%, missing rate = 40%.  Table 4. Simulation results for n = 500, censoring rate = 40%, missing rate = 20%.   ies that investigated factors that influence fruits and vegetables consumption, see [30] for example. Although here, the complete-case analysis yields the same conclusions as the RC, MI and AIPW methods, we observe some differences between the CC estimates and the RC, MI and AIPW estimates, which might reflect the bias of the CC method observed in the simulation study. Moreover, the CC estimates have usually larger standard errors, which reflects the loss of efficiency of the method.
In this example, the three AIPW variance estimates given in (5.2) are equal up to 3 digits (for this reason, only one standard error is reported in Table  7). This suggests that both the missingness model and conditional model for the censoring indicators given observed variables are correctly specified. For this reason and in view of the conclusions of the simulation study, we would recommand to use the AIPW estimate for further statistical inference on this data set.

Remark.
A naive (but easy to implement, using standard statistical softwares) estimation method consists in fitting an uncensored Poisson regression model to the data (that is, the censored intakes are treated as if they were uncensored). Using this method, the estimated constant is 1.338 (from Table 7, a consensus estimate is around 1.6). As expected, this naive method underestimates the baseline level of fruits and vegetables consumption.

Discussion
In this article, we have investigated several estimators of the regression parameter of the censored Poisson regression model when censoring indicators are partially missing. The regression calibration and multiple imputation estimates and their asymptotic variance estimators lead to reliable inferences when the model for the missing data given the observed variables is correctly specified, while the augmented inverse probability weighted estimator is asymptotically robust against misspecification of either the model for the missing data or the missingness mechanism. In finite samples, the AIPW estimator seems to be be more sensitive to a misspecification of the censoring mechanism than of the missingness mechanism. Now, several issues deserve attention. First, in this work, we considered missing censoring indicators in the Poisson regression model, which assumes equidispersion. A similar issue may arise with under-or over-dispersed counts. The generalized Poisson regression model (see [11] for example) is an appealing model for such data. The negative binomial regression model is an other option for modeling over-dispersed counts. When over-dispersion is due to zero-inflation, zero-inflated regression models (such as zero-inflated Poisson, zero-inflated generalized Poisson or zero-inflated negative binomial models) are appropriate. The estimates proposed in our paper may be adapted to these models and similar techniques could be used to investigate their asymptotic properties.
An other topic for further research is as follows. Our estimators rely on parametric models for the missing data and missingness mechanism. It is important to assess the sensitivity of the statistical inference to deviations to these models. An alternative estimation strategy may use semiparametric or nonparametric estimation of the models for missing data and missingness mechanism, and is also the topic for our future work.
Appendix A: Proof of Theorem 3.1 Consistency. The consistency ofβ n can be proved by verifying the conditions of the inverse function theorem [13]. We describe the main steps of the proof and omit calculation details.
Secondly, we need to show that n −1˙ n (β 0 ,θ n ) = o P (1). To see this, we decompose n −1˙ n (β 0 ,θ n ): By the weak law of large numbers, n −1˙ n (β 0 , θ 0 ) converges in probability to where the second line follows by taking the conditional expectation given W.
Having proved the conditions of the inverse function theorem, we conclude thatβ n converges in probability to β 0 .
Asymptotic normality. A Taylor's expansion of˙ n (β n ,θ n ) around (β 0 , θ 0 ) yields We have: Combining this and (3.2), we can write: since under the missing at random assumption, we have: We consider now the covariance structure of U i . We have Finally, Theorem 3.1 follows from the multivariate central limit theorem and Slutsky's theorem.
Proof of Lemma 7.1. In this proof, for notational simplicity, we will write f θ instead of f β0,θ,j . First, note that where G n f θ denotes the empirical process evaluated at f θ . To prove the lemma, we first prove that the class of functions {f θ : θ ∈ Θ} is Donsker (see, for example, [41] for a detailed account on empirical processes and Donsker classes).
It follows that the sequence of processes {G n f θ : θ ∈ Θ} converges in distribution to a tight limit process, and as such, is stochastically equicontinuous. Thus, Lemma 14.3 of [40] and the consistency ofθ n imply that G n fθ n − G n f θ0 P −→ 0, which is exactly (7.3). This concludes the proof.

Now, letting Q
(1) n,1, and Q i, denote the -th component of the vectors Q (1) n,1 and Q i respectively (for = 1, . . . , p), we have: Conditions C1 and C6 ensure that there exists a finite positive constant c 3 such that and the condition C7 implies that ≤ c 3 (u + o P (1)) γ * −γ n .
Using similar arguments as for Q (1) n,3 (respectively Q (1) n and Q (1) n,2 ), we can show that T (1) n (respectively T (2) n and T (3) n ) converge to 0 as n → ∞. Details are omitted. Now, n −1 ∂˘ n (β, θ 0 , γ * )/∂β converges to E[X i X i (−δ i (θ 0 , γ * )(e β Xi + Q i,β ) + Q i,β )] in probability. If the model m(W i , θ) is correctly specified (that is, m(W i , θ 0 ) = E(δ i |W i )), we have It follows that n −1 ∂˘ n (β,θ n ,γ n )/∂β converges in probability to −Σ 1 (β). Uniformity of the convergence follows by the same arguments as in the proof of Theorem 3.1. Finally, having proved conditions i, ii and iii, we apply the inverse function theorem of [13] and conclude thatβ n converges in probability to β 0 if m(W i , θ) is correctly specified. The consistency proof ofβ n when model π(W i , γ) is correctly specified proceeds along the same lines and is omitted.