Uniform improvement of empirical likelihood for missing response problem

Abstract: An empirical likelihood (EL) estimator was proposed by Qin and Zhang (2007) for improving the inverse probability weighting estimation in a missing response problem. The authors showed by simulation studies that the finite sample performance of EL estimator is better than certain existing estimators and they also showed large sample results for the estimator. However, the empirical likelihood estimator does not have a uniformly smaller asymptotic variance than other existing estimators in general. We consider several modifications to the empirical likelihood estimator and show that the proposed estimator dominates the empirical likelihood estimator and several other existing estimators in terms of asymptotic efficiencies under missing at random. The proposed estimator also attains the minimum asymptotic variance among estimators having influence functions in a certain class and enjoys certain double robustness properties.


Introduction and existing estimators
Suppose we are interested in estimating the mean µ of a random variable Y but Y is partially observed subject to missingness.Let X be a vector of covariates that are fully observable and R be an indicator that Y is observed.The observed data are (r i , r i y i , x i ) for i = 1, . . ., n and are i.i.d.realizations from (R, RY, X).Under a missing at random assumption that P (R = 1|Y, X) = P (R = 1|X) = π 0 (X), µ can be consistently estimated by the inverse probability weighting (IPW) estimator For missing data applications the non-missing probability is usually not known but is being modeled.Suppose P (R = 1|X) = π(X; β 0 ), where β 0 is a finite dimensional parameter.Based on (r 1 , x 1 ), . . ., (r n , x n ), the parameter β 0 can be estimated by solving a likelihood score equation n −1 n i=1 l(x i ; β) = 0 where l(x; β) = [1 − π(x; β)] −1 [r i − π(x; β)] ∂π ∂β (x; β), and we denote β to be the solution.We usually replace π 0 (x i ) by the estimated probability π(x i ; β) in IPW estimation.
The IPW estimator is intuitive and easy to implement but is inefficient in general, because information from X is not fully utilized when Y is not observed.To improve efficiency, an empirical likelihood estimator is proposed by Qin and Zhang (2007) where a = (a 1 , . . ., a p ) is a fixed vector function of p < n dimensions, θ = n −1 × n i=1 π(x i ; β) and â = n −1 × n i=1 a(x i ).Let s(x; β, θ, a) = {1 − θπ −1 (x; β), π −1 (x; β)[a(x)−a] T } T and n 1 = n i=1 r i .Solving the constrained maximization problem, the empirical likelihood weights p EL i are expressed in terms of a vector of Lagrange multipliers λEL and the Lagrange multipliers satisfies a system of estimating equations Information from incomplete observations are utilized implicitly in the construction of weights p EL i from the constraints.When Y and a(X) are correlated, the empirical likelihood estimator usually improves upon the IPW estimator in terms of estimation efficiency.Although the EL methods works for arbitrary choice of a(X), Qin and Zhang (2007) showed that optimal efficiency is attained when the conditional expectation E(Y |X) is a linear combination of a(X).In practice, one could model E(Y |X) from the observed data, but the optimal case cannot be achieved because no model is perfect.We therefore consider the general case in which E(Y |X) may not be a linear combination of a(X) and a broader optimality result is shown under the general case.The main contribution of the paper is to propose a uniform improvement of the existing empirical likelihood estimator for arbitrary pre-specified a(X).
We note that the empirical likelihood estimator of Qin and Zhang (2007) is different from the general empirical likelihood methodologies of Qin and Lawless (1994) for estimation from overidentified system of estimating equations and optimally combining estimating equations.Unlike the single-step estimator discussed in Qin and Lawless (1994), the empirical likelihood estimator of Qin and Zhang (2007) is computed in a two-step manner which would be much easier to implement.However, two-step estimators may have different efficiency compare to single-step estimators.Following Qin and Lawless (1994), Qin, Zhang and Leung (2009) studied a single-step empirical likelihood estimator for missing data problems.Other empirical likelihood estimators has also been proposed by Wang and Rao (2002) and Wang and Chen (2009), among others, for missing data problems under different settings.In missing data analysis, there is often an interest in studying robust estimator that gives consistent estimates under certain conditions even when the missing data mechanism is misspecified, a so-called doubly robustness property.The empirical likelihood estimator of Qin and Zhang (2007) enjoys the double robustness property together with the proposed modified estimator but double robustness was not discussed in Qin, Zhang and Leung (2009).We will further discuss this point in section 5.For the purpose of clarity, we refer empirical likelihood estimator to the estimator of Qin and Zhang (2007) in the rest of the paper.
The empirical likelihood estimator has nice small sample properties shown in simulations, but it does not theoretically dominate other existing estimators in terms of asymptotic efficiency for arbitrary pre-specified a(X).A partial review of other existing methods can be found in Kang and Schafer (2007).To study asymptotic efficiencies of estimators, we consider estimators which influence functions belong to the following class where ã = (1, a 1 , . . ., a p , (1 − π 0 ) −1 ∂π T /∂β) T .The class L is a large class of influence functions and contains many important existing estimators.The EL estimator, together with the survey calibration estimator [Deville and Särndal (1992)] and the augmented inverse probability weighting estimator [Robins, Rotnitzky and Zhao (1994)], all have influence functions in the class L. However, all of the aforementioned estimators do not attain the minimum variance in class L and do not strictly dominates one another under missing at random assumptions and for estimated weights.The main purpose of this paper is to study modifications of the empirical likelihood estimator that attains the minimum asymptotic variance among class L, and therefore theoretically dominate the EL estimator and many other existing estimators when the same amount of covariate information is used.To motivate our main results, section 2 will first discuss the optimality of empirical likelihood under the stronger missing completely at random assumption.In section 3, we will first show that the EL estimator is suboptimal in general under missing at random with estimated missing probability.We go on to discuss modifications that improve efficiencies and achieve optimality conditions.Section 4 will present simulation studies comparing the finite performance of estimators.The choice of class L among other possible classes of influence functions are further discussed in section 5.

Optimality of empirical likelihood under missing completely at random
To better motivate the modifications of the EL estimator under missing at random assumption, we first investigate the optimality of EL estimator under the stronger assumption of missing completely at random, i.e.P (R = 1|Y, X) = π 0 , a constant independent of Y and X.We assume that π 0 is a known constant in this section.Under this special case, the EL weight is T } T and the Lagrange multipliers satisfy the following system of estimating equations Under this special case, we consider estimators which influence functions belong to the following class where a ′ (X) = (1, a(X) T ) T .Before we show that empirical likelihood attains the minimum efficiency among class L ′ , we first characterize the conditions for m(X) to achieve minimum efficiency.For an estimator having influence function in class L ′ , its asymptotic variance is (2.3) To characterize the optimal estimator, let m 0 (X) = c T 0 a ′ (X) where By the definition of c 0 , the following set of orthogonal conditions are satisfied: is minimized among the class L ′ .Furthermore, we show the optimality of empirical likelihood estimator by noting that where the second last equality follows from (2.1) and (2.2).Note that the second summation in (2.5) corresponds to the optimal estimator in class L ′ .To show that the empirical likelihood estimator is optimal within class L ′ , we need to show that the first summation in (2.5) is o p (n −1/2 ).Under mild regularity conditions, we can show by asymptotic properties for estimating equations [Newey and McFadden (1994)] that √ n( λEL − 0) converges weakly to a Gaussian distribution.By Taylor series expansion, where and hence μEL is asymptotically equivalent to the minimum variance estimator in class L ′ .

Modified empirical likelihood under missing at random with estimated missing probability
In section 2 we showed that empirical likelihood is optimal in which it attains the minimum asymptotic variance in a class L ′ under the missing completely at random assumption.However, empirical likelihood estimator does not attain the minimum asymptotic variance in class L under the missing at random assumption.We can see this fact by considering similar arguments as in section 2. First, an estimator having influence function in class L has asymptotic variance given by Since m(X) = c T ã(X) in class L, the minimum asymptotic variance in class L is where q is the dimension of ã.Next, we characterize the minimum variance estimator in class L. Let m 0 (X) = c T 0 ã(X) where where q is the dimension of ã.The variance of R π0(X) [Y − m 0 (X)] + [m 0 (X) − µ] is the minimum among the class L as in (3.1).Note that the optimal estimator will have {1 − π 0 (X)}/π 0 (X) inside the expectation (3.1) whereas (1 − π 0 )/π 0 can be written outside the expectation in (2.3).By the definition of c 0 , the following set of orthogonal conditions are satisfied.
We then show that, unlike under the special case of missing completely at random, the empirical likelihood estimator do not achieve minimum variance among class L under missing at random.We note that As shown in Qin and Zhang (2007), √ n( λEL − 0) converges weakly to Gaussian distributions.By Taylor Series expansions, where . Unlike the case of missing completely at random, there is no direct relationship between A EL and the orthogonal conditions (3.2), and the matrix is generally nonzero.
Note that even when the missing probability is known, A EL is non-zero in general and therefore the empirical likelihood estimator is still suboptimal among class L.
Our aim is to construct an estimator attaining the minimum asymptotic variance (3.1).When the missing probability is estimated, we employ the following three strategies in constructing the optimal estimator: (1) We need to modify both the empirical likelihood weights and the estimating equations for finding Lagrange multipliers so that the corresponding matrix A MEL defined later is a zero matrix.(2) We need to augment our number of constraints to ensure that (3) We want to remove the influence from the estimate θ by removing it entirely in the estimation.To achieve these goals, we consider the following modifications.First, we start from point (3).Since θ is a consistent estimator for P (R = 1), in the modification we replace θ/n 1 in (1.1) by 1/n.Next, we replace s(x i ; β, θ, â) by an augmented version s * (x i ; β, â, b), where where the modified empirical likelihood (MEL) weights are and the pseudo Lagrange multiplier λMEL are obtained by solving We now show that the construction of modified empirical likelihood estimator mentioned above attains minimum asymptotic variance among class L. Again, we note that where the second last equality follows from (3.5), (3.6) and the definition of s * .Under mild regularity conditions [Newey and McFadden (1994)], it can be shown that √ n( λMEL − 0) and √ n( β − β) converges weakly to Gaussian distributions.Also by Taylor Series expansions, where ).For modified empirical likelihood, matrices A MEL and B are both 0 following the orthogonal conditions (3.2).Therefore, it follows from (3.7) and (3.8) that the influence function of μMEL is R π0(X) [Y −m 0 (X)]+[m 0 (X)−µ] which attains the minimum variance among the class L.
For the special case where In this case, the modified empirical likelihood estimator attains the semiparametric efficiency bound.Also, empirical likelihood attains the same semiparametric efficiency bound under correct specification of the outcome regression model.However, for arbitrary pre-specified a(X), the modified empirical likelihood has a smaller asymptotic variance than empirical likelihood and other estimators having influence functions in class L in general.
The modified empirical likelihood estimator also possesses a double robustness property as for the empirical likelihood estimator.Suppose E(Y |X) = b 0 + b T 1 a(X) = m 0 (X) but the missing data model π(x; β) is misspecified.The estimates β, λMEL and b converges in probability to some constants β * , λ * [White (1982), Hall and Inoue (2003)] and µ * b and λ * is usually non-zero.From (3.7) we note that That is, the modified empirical likelihood estimator is consistent when the outcome regression model is correctly specified even when the missing data model is misspecified.correctly specified outcome and missing data models were regression models with Z as covariates, whereas we treated X to be the covariates in misspecified models instead of Z. Kang and Schafer (2007) showed that the missspecified models are nearly correctly specified.In each case we considered four possible combinations of correct and misspecified missing data and outcome regression models: (a) both correct; (b) correct missing data model and incorrect outcome regression; (c) incorrect missing data model but correct outcome regression and (d) both incorrect.For correctly specified outcome model, a(Z) = (Z 1 , Z 2 , Z 3 , Z 4 ) and a(X) = (X 1 , X 2 , X 3 , X 4 ) for misspecified outcome model.We compared the performances of the augmented inverse probability weighted estimator μAIP W , the empirical likelihood estimator μEL , the survey calibration estimator μCAL and the modified empirical likelihood estimator μMEL .Relative efficiency (RE) is defined as the ratio of mean squared error of an estimator to the mean squared error of the modified empirical likelihood estimator.The results are shown in Table 1.Simulation results showed that EL, CAL, AIPW and MEL estimators all had relatively small bias when either the missing data model or the outcome regression model was correctly specified.When both models were correctly specified, all estimators had very similar performances because all of them were semiparametric locally efficient.When only one of the two models were correctly specified, the empirical likelihood, calibration and modified empirical likelihood estimators were more efficient than the AIPW estimator.When both models were misspecified, the AIPW estimator had a considerable bias and variability but the other empirical likelihood based estimators showed much better per- formance.When the outcome regression model was misspecified, the modified empirical likelihood estimators had smaller bias and variability compared to the empirical likelihood and calibration estimators, which were consistent with the theoretical results.In this simulation study, the modified empirical likelihood estimator performed consistently better than other estimators.
The second simulation study was an adaptation of the scenario in Qin and Zhang (2007).Sample sizes for each simulated data set was 200, 500 or 1000, and 1000 Monte Carlo datasets were generated.For each observation, a standard normal random variable X was generated.Conditional on X = x, Y is distributed from a normal distribution with mean 2 + 3x 2 and variance x 2 .The non-missing indicator R follows a generalized linear model with a logit or a complementary log-log link: In both cases, we assumed a logistic working model for R, so that the working model is misspecified under the complementary log-log model.We considered a(X) = X 2 and a(X) = X, and the latter corresponded to a misspecified working regression model.The simulation results are shown in Table 2.Under this scenario, we had very similar conclusions as in Table 1.In particular, the modified empirical likelihood estimator performed consistently better than other estimators, including the empirical likelihood estimator.The improvement in efficiency was greater when the working outcome regression model was misspecified.

Discussions
Empirical likelihood has gained increasing attention in missing data analysis since it has a good small sample performance and is also easy to compute.In this paper we studied modifications of empirical likelihood estimator that attain uniform improvement in asymptotic efficiency.Results from simulation studies were consistent with the theoretical results that the modifications improve efficiencies.
We consider the class of influence functions L noting that certain important existing estimators are within that class, including the empirical likelihood estimator of Qin and Zhang (2007).A different empirical likelihood estimator has been proposed by Qin, Zhang and Leung (2009) that minimizes asymptotic variance in the following class of influence functions where l is the dimension of β and g(X) is a fixed function.The function g(X) plays a similar role as m(X) in L, however, we allow m(X) to be any linear combination of ã in the class L but g(X) is fixed in advance.Another major difference is that estimators in the class L ′′ may not be doubly robust when a and b are not identical.However, estimators having influence functions in class L enjoy double robustness properties in general.Since double robustness offer protection against misspecification of missing data model and can be useful in practice, we therefore keep our attention to the class L.
The method of Qin, Zhang and Leung (2009) was designed for general estimating functions.A reviewer suggested us to extend the modified empirical likelihood estimation to the general case.Let U (Y, X; β) be an unbiased estimating function.Consider the class of influence function where β 0 is the true value of β.Since p MEL i can be computed once ã(X) is specified, it seems reasonable to solve the weighted estimating equation complete case observations (i.e. when r i = 1) and the following empirical log-likelihood functionl = n i=1 r i log p EL i is maximized subject to constraints i ) = â, which corresponds to the constraints in empirical likelihood estimation.In addition, we haven i=1 r i p MEL i (1 − π(x i ; β)) −1 ∂π(x i ; β)/ ∂β = b.Unlike (1.1) and (1.2), (3.5) and (3.6) are not implied by constrained maximization problems.The reason is similar to the fact that not all estimating functions are derivative of log-likelihood functions.
i , x i ; β) = 0to obtain an estimate of β.Statistical properties of the weighted estimating equation will be studied in the future.

Table 1
Comparisons among estimators under the Kang and Schafer scenario with four possible combinations of correct and misspecified missing data and outcome regression models, (a) both correct, (b) correct missing data model and incorrect outcome regression, (c) incorrect missing data model but correct outcome regression and (d) both incorrect.RMSE represents the square root of sampling mean squared error.RE represents the relative efficiency compared to the modified empirical likelihood estimator

Table 2
Comparisons among estimators under the Qin and Zhang scenario with four possible combinations of correct and misspecified missing data and outcome regression models, (a) both correct, (b) correct missing data model and incorrect outcome regression, (c) incorrect missing data model but correct outcome regression and (d) both incorrect.RMSE represents the square root of sampling mean squared error.RE represents the relative efficiency compared to the modified empirical likelihood estimator