The Annals of Statistics

Semiparametric efficiency in GMM models with auxiliary data

Xiaohong Chen, Han Hong, and Alessandro Tarozzi

Source: Ann. Statist. Volume 36, Number 2 (2008), 808-843.

Abstract

We study semiparametric efficiency bounds and efficient estimation of parameters defined through general moment restrictions with missing data. Identification relies on auxiliary data containing information about the distribution of the missing variables conditional on proxy variables that are observed in both the primary and the auxiliary database, when such distribution is common to the two data sets. The auxiliary sample can be independent of the primary sample, or can be a subset of it. For both cases, we derive bounds when the probability of missing data given the proxy variables is unknown, or known, or belongs to a correctly specified parametric family. We find that the conditional probability is not ancillary when the two samples are independent. For all cases, we discuss efficient semiparametric estimators. An estimator based on a conditional expectation projection is shown to require milder regularity conditions than one based on inverse probability weighting.

Primary Subjects: 62H12, 62D05
Secondary Subjects: 62F12, 62G20
Keywords: Semiparametric efficiency bounds; GMM; measurement error; missing data; auxiliary data; sieve estimation

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber.
If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aos/1205420520
Digital Object Identifier: doi:10.1214/009053607000000947
Mathematical Reviews number (MathSciNet): MR2396816
Zentralblatt MATH identifier: 1133.62023

References

Ai, C. and Chen, X. (2003). Efficient estimation of models with conditional moment restrictions containing unknown functions. Econometrica 71 1795–1843.
Mathematical Reviews (MathSciNet): MR2015420
Digital Object Identifier: doi:10.1111/1468-0262.00470
Begun, J., Hall, W., Huang, W. and Wellner, J. (1983). Information and asymptotic efficiency in parametric-nonparametric models. Ann. Statist. 11 432–452.
Mathematical Reviews (MathSciNet): MR696057
Digital Object Identifier: doi:10.1214/aos/1176346151
Project Euclid: euclid.aos/1176346151
Bickel, P. J., Klaassen, C. A., Ritov, Y. and Wellner, J. A. (1993). Efficient and Adaptive Estimation for Semiparametric Models. Johns Hopkins Univ. Press, Baltimore, MD.
Mathematical Reviews (MathSciNet): MR1245941
Zentralblatt MATH: 0786.62001
Breslow, N. E., McNeney, B. and Wellner, J. A. (2003). Large sample theory for semiparametric regression models with two-phase, outcome dependent sampling. Ann. Statist. 31 1110–1139.
Mathematical Reviews (MathSciNet): MR2001644
Digital Object Identifier: doi:10.1214/aos/1059655907
Project Euclid: euclid.aos/1059655907
Breslow, N. E., Robins, J. M. and Wellner, J. A. (2000). On the semiparametric efficiency of logistic regression under case-control sampling. Bernoulli 6 447–455.
Mathematical Reviews (MathSciNet): MR1762555
Digital Object Identifier: doi:10.2307/3318670
Project Euclid: euclid.bj/1081616700
Carroll, R., Ruppert, D. and Stefanski, L. (1995). Measurement Error in Nonlinear Models. Chapman and Hall, New York.
Mathematical Reviews (MathSciNet): MR1630517
Zentralblatt MATH: 0853.62048
Carroll, R. J. and Wand, M. P. (1991). Semiparametric estimation in logistic measurement error models. J. Roy. Statist. Soc. Ser. B 53 573–585.
Mathematical Reviews (MathSciNet): MR1125715
Chen, J. B. and Breslow, N. E. (2004). Semiparametric efficient estimation for the auxiliary outcome problem with the conditional mean model. Canad. J. Statist. 32 359–372.
Mathematical Reviews (MathSciNet): MR2125850
Digital Object Identifier: doi:10.2307/3316021
Chen, X., Hong, H. and Tamer, E. (2005). Measurement error models with auxiliary data. Rev. Economic Studies 72 343–366.
Mathematical Reviews (MathSciNet): MR2129825
Digital Object Identifier: doi:10.1111/j.1467-937X.2005.00335.x
Chen, X., Linton, O. and van Keilegom, I. (2003). Estimation of semiparametric models when the criterion function is not smooth. Econometrica 71 1591–1608.
Mathematical Reviews (MathSciNet): MR2000259
Digital Object Identifier: doi:10.1111/1468-0262.00461
Chen, X. and Shen, X. (1998). Sieve extremum estimates for weakly dependent data. Econometrica 66 289–314.
Mathematical Reviews (MathSciNet): MR1612238
Digital Object Identifier: doi:10.2307/2998559
Clogg, C., Rubin, D., Schenker, N., Schultz, B. and Weidman, L. (1991). Multiple imputation of industry and occupation codes in census public-use samples using Bayesian logistic regression. J. Amer. Statist. Assoc. 86 68–78.
Deaton, A. (2003). Adjusted Indian poverty estimates for 1999–2000. Economic and Political Weekly 38 322–326.
Deaton, A. and Drèze, J. (2002). Poverty and inequality in India, a re-examination. Economic and Political Weekly 37 3729–3748.
Deaton, A. and Kozel, V., eds. (2005). Data and Dogma: The Great Indian Poverty Debate. MacMillian, New Delhi, India.
Gallant, A. R. and Nychka, D. W. (1987). Semi-nonparametric maximum likelihood estimation. Econometrica 55 363–390.
Mathematical Reviews (MathSciNet): MR882100
Digital Object Identifier: doi:10.2307/1913241
Hahn, J. (1998). On the role of propensity score in efficient semiparametric estimation of average treatment effects. Econometrica 66 315–332.
Mathematical Reviews (MathSciNet): MR1612242
Digital Object Identifier: doi:10.2307/2998560
Heckman, J., LaLonde, R. and Smith, J. (1999). The economics and econometrics of active labor market programs. In Handbook of Labor Economics 3A (O. Ashenfelter and D. Card, eds.) 1865–2097. North-Holland, Amsterdam.
Hirano, K., Imbens, G. and Ridder, G. (2003). Efficient estimation of average treatment effects using the estimated propensity score. Econometrica 71 1161–1189.
Mathematical Reviews (MathSciNet): MR1995826
Digital Object Identifier: doi:10.1111/1468-0262.00442
Ibragimov, I. A. and Has’minskii, R. Z. (1981). Statistical Estimation: Asymptotic Theory. Springer, New York.
Mathematical Reviews (MathSciNet): MR620321
Zentralblatt MATH: 0467.62026
Imbens, G., Newey, W. and Ridder, G. (2005). Mean-squared-error calculations for average treatment effects. Working paper.
Lee, L. and Sepanski, J. (1995). Estimation of linear and nonlinear errors-in-variables models using validation data. J. Amer. Statist. Assoc. 90 130–140.
Mathematical Reviews (MathSciNet): MR1325120
Digital Object Identifier: doi:10.2307/2291136
Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, Hoboken, NJ.
Mathematical Reviews (MathSciNet): MR1925014
Zentralblatt MATH: 1011.62004
Newey, W. (1990). Semiparametric efficiency bounds. J. Appl. Econometrics 5 99–135.
Newey, W. (1994). The asymptotic variance of semiparametric estimators. Econometrica 62 1349–82.
Mathematical Reviews (MathSciNet): MR1303237
Digital Object Identifier: doi:10.2307/2951752
Robins, J., Mark, S. and Newey, W. (1992). Estimating exposure effects by modelling the expectation of exposure conditional on confounders. Biometrics 48 479–495.
Mathematical Reviews (MathSciNet): MR1173493
Digital Object Identifier: doi:10.2307/2532304
Robins, J. M. and Rotnitzky, A. (1995). Semiparametric efficiency in multivariate regression models with missing data. J. Amer. Statist. Assoc. 90 122–129.
Mathematical Reviews (MathSciNet): MR1325119
Digital Object Identifier: doi:10.2307/2291135
Robins, J. M., Rotnitzky, A. and Zhao, L. P. (1994). Estimation of regression coefficients when some regressors are not always observed. J. Amer. Statist. Assoc. 89 846–866.
Mathematical Reviews (MathSciNet): MR1294730
Digital Object Identifier: doi:10.2307/2290910
Rotnitzky, A. and Robins, J. (1995). Semiparametric regression estimation in the presence of dependent censoring. Biometrika 82 805–820.
Mathematical Reviews (MathSciNet): MR1380816
Zentralblatt MATH: 0861.62030
Digital Object Identifier: doi:10.1093/biomet/82.4.805
Schenker, N. (2003). Assessing variability due to race bridging: Application to census counts and vital rates for the year 2000. J. Amer. Statist. Assoc. 98 818–828.
Mathematical Reviews (MathSciNet): MR2055490
Digital Object Identifier: doi:10.1198/016214503000000756
Sepanski, J. and Carroll, R. (1993). Semiparametric quasi-likelihood and variance estimation in measurement error models. J. Econometrics 58 223–256.
Mathematical Reviews (MathSciNet): MR1230987
Digital Object Identifier: doi:10.1016/0304-4076(93)90120-T
Shen, X. (1997). On methods of sieves and penalization. Ann. Statist. 25 2555–2591.
Mathematical Reviews (MathSciNet): MR1604416
Digital Object Identifier: doi:10.1214/aos/1030741085
Project Euclid: euclid.aos/1030741085
Shen, X. and Wong, W. (1994). Convergence rates of sieve estimates. Ann. Statist. 22 580–615.
Mathematical Reviews (MathSciNet): MR1292531
Digital Object Identifier: doi:10.1214/aos/1176325486
Project Euclid: euclid.aos/1176325486
Tarozzi, A. (2007). Calculating comparable statistics from incomparable surveys, with an application to poverty in India. J. Business and Economic Statistics 25 314–336.
Mathematical Reviews (MathSciNet): MR2380751
Wang, Q., Linton, O. and Hardle, W. (2004). Semiparametric regression analysis for missing response data. J. Amer. Statist. Assoc. 99 334–345.
Mathematical Reviews (MathSciNet): MR2062820
Digital Object Identifier: doi:10.1198/016214504000000449
Wooldridge, J. (2002). Inverse probability weighted M-estimators for sample selection, attrition and stratification. Portuguese Economic J. 1 117–139.
Wooldridge, J. (2003). Inverse probability weighted estimation for general missing data problems. Manuscript, Michigan State Univ.

2009 © Institute of Mathematical Statistics