The Annals of Applied Statistics

Using missing types to improve partial identification with application to a study of HIV prevalence in Malawi

Zhichao Jiang and Peng Ding

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


Frequently, empirical studies are plagued with missing data. When the data are missing not at random, the parameter of interest is not identifiable in general. Without additional assumptions, we can derive bounds of the parameters of interest, which, unfortunately, are often too wide to be informative. Therefore, it is of great importance to sharpen these worst-case bounds by exploiting additional information. Traditional missing data analysis uses only the information of the binary missing data indicator, that is, a certain data point is either missing or not. Nevertheless, real data often provide more information than a binary missing data indicator, and they often record different types of missingness. In a motivating HIV status survey, missing data may be due to the units’ unwillingness to respond to the survey items or their hospitalization during the visit, and may also be due to the units’ temporarily absence or relocation. It is apparent that some missing types are more likely to be missing not at random, but other missing types are more likely to be missing at random. We show that making full use of the missing types results in narrower bounds of the parameters of interest. In a real-life example, we demonstrate substantial improvement of more than 50% reduction in bound widths for estimating the prevalence of HIV in rural Malawi. As we illustrate using the HIV study, our strategy is also useful for conducting sensitivity analysis by gradually increasing or decreasing the set of types that are missing at random. In addition, we propose an easy-to-implement method to construct confidence intervals for partially identified parameters with bounds expressed as the minimums and maximums of finite parameters, which is useful for not only our problem but also many other problems involving bounds.

Article information

Ann. Appl. Stat., Volume 12, Number 3 (2018), 1831-1852.

Received: August 2017
Revised: December 2017
First available in Project Euclid: 11 September 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Longitudinal data partial identification sensitivity analysis sharp bound testable condition


Jiang, Zhichao; Ding, Peng. Using missing types to improve partial identification with application to a study of HIV prevalence in Malawi. Ann. Appl. Stat. 12 (2018), no. 3, 1831--1852. doi:10.1214/17-AOAS1133.

Export citation


  • Andrews, D. W. K. (2000). Inconsistency of the bootstrap when a parameter is on the boundary of the parameter space. Econometrica 68 399–405.
  • Anglewicz, P., Adams, J., Obare, F., Kohler, H.-P. and Watkins, S. (2009). The Malawi Diffusion and Ideational Change Project 2004–06: Data collection, data quality, and analysis of attrition. Demogr. Res. 20 503–540.
  • Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444–455.
  • Arpino, B., De Cao, E. and Peracchi, F. (2014). Using panel data for partial identification of human immunodeficiency virus prevalence when infection status is missing not at random. J. Roy. Statist. Soc. Ser. A 177 587–606.
  • Balke, A. and Pearl, J. (1997). Bounds on treatment effects from studies with imperfect compliance. J. Amer. Statist. Assoc. 92 1171–1176.
  • Bang, H. and Robins, J. M. (2005). Doubly robust estimation in missing data and causal inference models. Biometrics 61 962–972.
  • Cheng, J. and Small, D. S. (2006). Bounds on causal effects in three-arm trials with non-compliance. J. R. Stat. Soc. Ser. B. Stat. Methodol. 68 815–836.
  • Chernozhukov, V., Lee, S. and Rosen, A. M. (2013). Intersection bounds: Estimation and inference. Econometrica 81 667–737.
  • Cochran, W. G. (1953). Sampling Techniques. Wiley, New York.
  • Copas, J. B. and Li, H. G. (1997). Inference for non-random samples. J. Roy. Statist. Soc. Ser. B 59 55–95.
  • Ding, P. and Geng, Z. (2014). Identifiability of subgroup causal effects in randomized experiments with nonignorable missing covariates. Stat. Med. 33 1121–1133.
  • Harel, O. and Schafer, J. L. (2009). Partial and latent ignorability in missing-data problems. Biometrika 96 37–50.
  • Horowitz, J. L. and Manski, C. F. (1998). Censoring of outcomes and regressors due to survey nonresponse: Identification and estimation using weights and imputations. J. Econometrics 84 37–58.
  • Horowitz, J. L. and Manski, C. F. (2000). Nonparametric analysis of randomized experiments with missing covariate and outcome data. J. Amer. Statist. Assoc. 95 77–88.
  • Imbens, G. W. and Manski, C. F. (2004). Confidence intervals for partially identified parameters. Econometrica 72 1845–1857.
  • Jiang, Z. and Ding, P. (2018). Supplement to “Using missing types to improve partial identification with application to a study of HIV prevalence in Malawi.” DOI:10.1214/17-AOAS1133SUPP.
  • Jiang, Z., Ding, P. and Geng, Z. (2016). Principal causal effect identification and surrogate end point evaluation by multiple trials. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 829–848.
  • Jin, H. and Rubin, D. B. (2008). Principal stratification for causal inference with extended partial compliance. J. Amer. Statist. Assoc. 103 101–111.
  • Kang, J. D. Y. and Schafer, J. L. (2007). Demystifying double robustness: A comparison of alternative strategies for estimating a population mean from incomplete data. Statist. Sci. 22 523–539.
  • Kitagawa, T. (2015). A test for instrument validity. Econometrica 83 2043–2063.
  • Lee, D. S. (2009). Training, wages, and sample selection: Estimating sharp bounds on treatment effects. Rev. Econ. Stud. 76 1071–1102.
  • Little, R. J. (1993). Pattern-mixture models for multivariate incomplete data. J. Amer. Statist. Assoc. 88 125–134.
  • Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley-Interscience, Hoboken, NJ.
  • Little, R. J., Rubin, D. B. and Zangeneh, S. Z. (2017). Conditions for ignoring the missing-data mechanism in likelihood inferences for parameter subsets. J. Amer. Statist. Assoc. 112 314–320.
  • Long, D. M. and Hudgens, M. G. (2013). Sharpening bounds on principal effects with covariates. Biometrics 69 812–819.
  • Ma, W.-Q., Geng, Z. and Hu, Y.-H. (2003). Identification of graphical models for nonignorable nonresponse of binary outcomes in longitudinal studies. J. Multivariate Anal. 87 24–45.
  • Manski, C. F. (2003). Partial Identification of Probability Distributions. Springer, New York.
  • Manski, C. F. (2009). Identification for Prediction and Decision. Harvard Univ. Press, Cambridge.
  • Manski, C. F. and Pepper, J. V. (2000). Monotone instrumental variables: With an application to the returns to schooling. Econometrica 68 997–1010.
  • Mattei, A., Mealli, F. and Pacini, B. (2014). Identification of causal effects in the presence of nonignorable missing outcome values. Biometrics 70 278–288.
  • Mealli, F. and Pacini, B. (2013). Using secondary outcomes to sharpen inference in randomized experiments with noncompliance. J. Amer. Statist. Assoc. 108 1120–1131.
  • Mealli, F. and Rubin, D. B. (2015). Clarifying missing at random and related definitions, and implications when coupled with exchangeability. Biometrika 102 995–1000.
  • Miao, W., Ding, P. and Geng, Z. (2016). Identifiability of normal and normal mixture models with nonignorable missing data. J. Amer. Statist. Assoc. 111 1673–1683.
  • Molenberghs, G., Kenward, M. G. and Goetghebeur, E. (2001). Sensitivity analysis for incomplete contingency tables: The Slovenian plebiscite case. J. R. Stat. Soc. Ser. C. Appl. Stat. 50 15–29.
  • Romano, J. P. and Shaikh, A. M. (2010). Inference for the identified set in partially identified econometric models. Econometrica 78 169–211.
  • Rotnitzky, A., Scharfstein, D., Su, T.-L. and Robins, J. (2001). Methods for conducting sensitivity analysis of trials with potentially nonignorable competing causes of censoring. Biometrics 57 103–113.
  • Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592. With comments by R. J. A. Little and a reply by the author.
  • Rubin, D. B. (2004). Multiple Imputation for Nonresponse in Surveys. Wiley-Interscience, Hoboken, NJ. Reprint of the 1987 edition.
  • Rubin, D. B. (2005). Comment on “Multiple-bias modelling for analysis of observational data” by S. Greenland. J. Roy. Statist. Soc. Ser. A 168 302.
  • Scharfstein, D. O., Manski, C. F. and Anthony, J. C. (2004). On the construction of bounds in prospective studies with missing ordinal outcomes: Application to the good behavior game trial. Biometrics 60 154–164.
  • Scharfstein, D. O., Rotnitzky, A. and Robins, J. M. (1999). Adjusting for nonignorable drop-out using semiparametric nonresponse models. J. Amer. Statist. Assoc. 94 1096–1146.
  • Seaman, S., Galati, J., Jackson, D. and Carlin, J. (2013). What is meant by “missing at random”? Statist. Sci. 28 257–268.
  • Shao, J. and Wang, L. (2016). Semiparametric inverse propensity weighting for nonignorable missing data. Biometrika 103 175–187.
  • Tang, G., Little, R. J. A. and Raghunathan, T. E. (2003). Analysis of multivariate missing data with nonignorable nonresponse. Biometrika 90 747–764.
  • Vansteelandt, S., Goetghebeur, E., Kenward, M. G. and Molenberghs, G. (2006). Ignorance and uncertainty regions as inferential tools in a sensitivity analysis. Statist. Sinica 16 953–979.
  • Yang, S. and Kim, J. K. (2016). A note on multiple imputation for method of moments estimation. Biometrika 103 244–251.
  • Yang, F. and Small, D. S. (2016). Using post-outcome measurement information in censoring-by-death problems. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 299–318.

Supplemental materials

  • Supplement to “Using missing types to improve partial identification with application to a study of HIV prevalence in Malawi”. The supplementary material consists of four parts. Section S1 gives the proofs of the theorems of the bounds. Section S2 gives the testable conditions with multiple time points. Section S3 gives the proofs of the theorem and corollary for constructing confidence interval. Section S4 shows the results of the simulation studies.