We study semiparametric efficiency bounds and efficient estimation of parameters defined through general moment restrictions with missing data. Identification relies on auxiliary data containing information about the distribution of the missing variables conditional on proxy variables that are observed in both the primary and the auxiliary database, when such distribution is common to the two data sets. The auxiliary sample can be independent of the primary sample, or can be a subset of it. For both cases, we derive bounds when the probability of missing data given the proxy variables is unknown, or known, or belongs to a correctly specified parametric family. We find that the conditional probability is not ancillary when the two samples are independent. For all cases, we discuss efficient semiparametric estimators. An estimator based on a conditional expectation projection is shown to require milder regularity conditions than one based on inverse probability weighting.
"Semiparametric efficiency in GMM models with auxiliary data." Ann. Statist. 36 (2) 808 - 843, April 2008. https://doi.org/10.1214/009053607000000947