Brazilian Journal of Probability and Statistics

Estimating the population mean when some responses are missing

Jingsong Lu, Edward J. Stanek III, and Elaine Puleo

Full-text: Open access

Abstract

We develop a design-based prediction approach to estimate the finite population mean in a simple setting where some responses are missing. The approach is based on indicator sampling random variables that operate on labeled units (subjects). We define missing data mechanisms that may depend on a subject, or on a selection (such as when the study design assigns groups of selected subjects to different interviewers). Using an approach usually reserved for model-based inference, we develop a predictor that equals the sample total divided by the expected sample size. The methods are based on best linear unbiased prediction in finite population mixed models. When the probability of missing is estimated from the sample, the empirical estimator simplifies to the mean of the realized nonmissing responses. The different missing data mechanisms are revealed by the notation that accounts for the labels and sample selections. The mean squared error (MSE) of the empirical estimator, counterintuitively, is smaller than the MSE if the probability of missing is known.

Article information

Source
Braz. J. Probab. Stat., Volume 25, Number 2 (2011), 171-182.

Dates
First available in Project Euclid: 31 March 2011

Permanent link to this document
https://projecteuclid.org/euclid.bjps/1301577152

Digital Object Identifier
doi:10.1214/10-BJPS115

Mathematical Reviews number (MathSciNet)
MR2793924

Zentralblatt MATH identifier
1298.62020

Keywords
Simple random sampling missing data MCAR finite population best linear unbiased estimator (BLUE) prediction

Citation

Lu, Jingsong; Stanek III, Edward J.; Puleo, Elaine. Estimating the population mean when some responses are missing. Braz. J. Probab. Stat. 25 (2011), no. 2, 171--182. doi:10.1214/10-BJPS115. https://projecteuclid.org/euclid.bjps/1301577152


Export citation

References

  • Bolfarine, H. and Rodrigues, J. (1987–1988). On the simple projection predictor in finite populations. Estadistica 3940, 55–59.
  • Bolfarine, H. and Rodrigues, J. (1989). A missing value approach to the prediction problem in finite population. Pub. Inst. Stat., Univ. Pari, Vol. XXXIV, Fasc. 2, 59–66.
  • Cassel, C. M., Särndal, C. E. and Wretman, J. H. (1977). Foundations of Inference in Survey Sampling. New York, NY: John Wiley.
  • Cochran, W. G. (1977). Sampling Techniques. New York, NY: John Wiley.
  • Li, W. (2003). Use of random permutation model in rate estimation and standardization. Ph.D Thesis. Dept. Biostatistics and Epidemiology, Univ. Massachusetts, Amherst, MA. Available at http://www.umass.edu/cluster/ed/publication/Li-2003-Dissertation.pdf.
  • Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data. New York, NY: John Wiley.
  • Lu, J. (2004). Estimating parameters when considering the unobserved units as missing values in simple random sampling. Masters Thesis. Dept. Biostatistics and Epidemiology, Univ. Massachusetts, Amherst, MA. Available at http://www.umass.edu/cluster/ed/publication/jingsonglu-full-thesis-2004.pdf.
  • Oh, H. L. and Scheuren, F. J. (1983). Weighting adjustment for unit nonresponse. In Incomplete Data in Sample Surveys, Volume 2: Theory and Bibliography (W. G. Madow, I. Olkin and D. B. Rubin, eds.) 143–184. New York: Academic Press.
  • Orchard, T. and Woodbury, M. (1972). A missing value information principle. Proceedings of the 6th Berkeley Symposium on Math. Statist. and Prob. 1, 697–715.
  • Rao, J. N. K. (1985). Conditional inference in survey sampling. Survey Methodology 11, 15–31.
  • Rao, J. N. K. and Bellhouse, D. R. (1978). Estimation of finite population mean under generalized random permutation model. Journal of Statistical Planning and Inference 2, 125–141.
  • Royall, R. M. (1988). The Prediction Approach to Sampling Theory (P. R. Krishnaiah and C. R. Rao, eds.). Handbook of Statistics 6 399–413. New York, NY: Elsevier Science Publishers.
  • Stanek E. J. III and Singer, J. M. (2004). Predicting random effects from finite population clustered samples with response error. Journal of the American Statistical Association 99, 1119–1130.
  • Stanek, E. J. III, Singer, J. M. and Lençina, V. B. (2004). A unified approach to estimation and prediction under simple random sampling. Journal of Statistical Planning and Inference 121, 325–338.
  • Stanek, E. J. III and Singer, J. D. (2008). Predicting random effects with an expanded finite population mixed model. Journal of Statistical Planning and Inference 138, 2991–3004.
  • Valliant, R., Dorfman, A. H. and Royall, R. M. (2000). Finite Population Sampling and Inference: A Prediction Approach. New York, NY: John Wiley.