Abstract
We consider independent observations on a random pair $(X,Y)$, where the response $Y$ is allowed to be missing at random but the covariate vector $X$ is always observed. We demonstrate that characteristics of the conditional distribution of $Y$ given $X$ can be estimated efficiently using complete case analysis, that is, one can simply omit incomplete cases and work with an appropriate efficient estimator which remains efficient. This means in particular that we do not have to use imputation or work with inverse probability weights. Those approaches will never be better (asymptotically) than the above complete case method.
This efficiency transfer is a general result and holds true for all regression models for which the distribution of $Y$ given $X$ and the marginal distribution of $X$ do not share common parameters. We apply it to the general homoscedastic semiparametric regression model. This includes models where the conditional expectation is modeled by a complex semiparametric regression function, as well as all basic models such as linear regression and nonparametric regression. We discuss estimation of various functionals of the conditional distribution, for example, of regression parameters and of the error distribution.
Citation
Ursula U. Müller. Anton Schick. "Efficiency transfer for regression models with responses missing at random." Bernoulli 23 (4A) 2693 - 2719, November 2017. https://doi.org/10.3150/16-BEJ824
Information