Statistical Science

What Is Meant by “Missing at Random”?

Shaun Seaman, John Galati, Dan Jackson, and John Carlin

Full-text: Open access

Abstract

The concept of missing at random is central in the literature on statistical analysis with missing data. In general, inference using incomplete data should be based not only on observed data values but should also take account of the pattern of missing values. However, it is often said that if data are missing at random, valid inference using likelihood approaches (including Bayesian) can be obtained ignoring the missingness mechanism. Unfortunately, the term “missing at random” has been used inconsistently and not always clearly; there has also been a lack of clarity around the meaning of “valid inference using likelihood”. These issues have created potential for confusion about the exact conditions under which the missingness mechanism can be ignored, and perhaps fed confusion around the meaning of “analysis ignoring the missingness mechanism”. Here we provide standardised precise definitions of “missing at random” and “missing completely at random”, in order to promote unification of the theory. Using these definitions we clarify the conditions that suffice for “valid inference” to be obtained under a variety of inferential paradigms.

Article information

Source
Statist. Sci., Volume 28, Number 2 (2013), 257-268.

Dates
First available in Project Euclid: 21 May 2013

Permanent link to this document
https://projecteuclid.org/euclid.ss/1369147915

Digital Object Identifier
doi:10.1214/13-STS415

Mathematical Reviews number (MathSciNet)
MR3112409

Zentralblatt MATH identifier
1331.62036

Keywords
Ignorability direct-likelihood inference frequentist inference repeated sampling missing completely at random

Citation

Seaman, Shaun; Galati, John; Jackson, Dan; Carlin, John. What Is Meant by “Missing at Random”?. Statist. Sci. 28 (2013), no. 2, 257--268. doi:10.1214/13-STS415. https://projecteuclid.org/euclid.ss/1369147915


Export citation

References

  • [1] Anscombe, F. J. (1964). Normal likelihood functions. Ann. Inst. Statist. Math. 16 1–19.
  • [2] Clayton, D. and Hills, M. (1993). Statistical Models in Epidemiology. Oxford Univ. Press, Oxford.
  • [3] Cox, D. R. and Hinkley, D. V. (1974). Theoretical Statistics. Chapman & Hall, London.
  • [4] Diggle, P., Farewell, D. and Henderson, R. (2007). Analysis of longitudinal data with drop-out: Objectives, assumptions and a proposal. J. R. Stat. Soc. Ser. C. Appl. Stat. 56 499–550.
  • [5] Diggle, P. J. (2004). Estimation with missing data (correspondence). Biometrics 50 580.
  • [6] Edwards, A. W. F. (1970). Discussion of “Application of likelihood methods to models involving large numbers of parameters” by J. D. Kalbfleisch and D. A. Sprott. J. R. Stat. Soc. Ser. B Stat. Methodol. 32 196–198.
  • [7] Fisher, R. A. (1956). Statistical Methods and Scientific Inference. Oliver and Boyd, Edinburgh.
  • [8] Fitzmaurice, G. M., Laird, N. M. and Ware, J. H. (2011). Applied Longitudinal Analysis, 2nd ed. Wiley, Hoboken, NJ.
  • [9] Harel, O. and Schafer, J. L. (2009). Partial and latent ignorability in missing-data problems. Biometrika 96 37–50.
  • [10] Heitjan, D. F. (1993). Ignorability and coarse data: Some biomedical examples. Biometrics 49 1099–1109.
  • [11] Heitjan, D. F. (1994). Ignorability in general incomplete-data models. Biometrika 81 701–708.
  • [12] Heitjan, D. F. (1997). Ignorability, sufficiency and ancillarity. J. R. Stat. Soc. Ser. B Stat. Methodol. 59 375–381.
  • [13] Heitjan, D. F. (2004). Estimation with missing data (correspondence). Biometrics 50 580.
  • [14] Heitjan, D. F. and Basu, S. (1996). Distinguishing “missing at random” and “missing completely at random”. Amer. Statist. 50 207–213.
  • [15] Heitjan, D. F. and Rubin, D. B. (1991). Ignorability and coarse data. Ann. Statist. 19 2244–2253.
  • [16] Hinde, J. and Aitkin, M. (1987). Canonical likelihoods: A new likelihood treatment of nuisance parameters. Biometrika 74 45–58.
  • [17] Jaeger, M. (2005). Ignorability in statistical and probabilistic inference. J. Artificial Intelligence Res. 24 889–917 (electronic).
  • [18] Kalbfleisch, J. D. and Sprott, D. A. (1970). Discussion of “Application of likelihood methods to models involving large numbers of parameters”. J. R. Stat. Soc. Ser. B Stat. Methodol. 32 204–208.
  • [19] Kass, K. E. and Wasserman, L. (1996). The selection of prior distributions by formal rules. J. Amer. Statist. Assoc. 91 1343–1370.
  • [20] Kenward, M. G. and Molenberghs, G. (1998). Likelihood based frequentist inference when data are missing at random. Statist. Sci. 13 236–247.
  • [21] Little, R. J. A. (1976). Comments on “Inference and missing data”. Biometrika 63 590–591.
  • [22] Little, R. J. A. (1995). Modeling the drop-out mechanism in repeated-measures studies. J. Amer. Statist. Assoc. 90 1112–1121.
  • [23] Little, R. J. A. and Rubin, D. B. (1987). Statistical Analysis with Missing Data. Wiley, New York.
  • [24] Little, R. J. A. and Rubin, D. B. (2002). Statistical Analysis with Missing Data, 2nd ed. Wiley, Hoboken, NJ.
  • [25] Lu, G. and Copas, J. B. (2004). Missing at random, likelihood ignorability and model completeness. Ann. Statist. 32 754–765.
  • [26] Molenberghs, G. and Kenward, M. G. (2007). Missing Data in Clinical Studies. Wiley, Chichester.
  • [27] Molenberghs, G., Kenward, M. G., Verbeke, G. and Birhanu, T. (2011). Pseudo-likelihood estimation for incomplete data. Statist. Sinica 21 187–206.
  • [28] Pawitan, Y. (2001). In All Likelihood. Clarendon, Oxford.
  • [29] Potthoff, R. F., Tudor, G. E., Pieper, K. S. and Hasselblad, V. (2006). Can one assess whether missing data are missing at random in medical studies? Stat. Methods Med. Res. 15 213–234.
  • [30] Reid, N. (2000). Likelihood. J. Amer. Statist. Assoc. 95 1335–1340.
  • [31] Robins, J. M. and Gill, R. D. (1997). Non-response models for the analysis of non-monotone ignorable missing data. Stat. Med. 16 39–56.
  • [32] Royall, R. M. (1997). Statistical Evidence: A Likelihood Paradigm. Monographs on Statistics and Applied Probability 71. Chapman & Hall, London.
  • [33] Rubin, D. B. (1976). Inference and missing data. Biometrika 63 581–592.
  • [34] Rubin, D. B. (1976). Reply to comments on “Inference and missing data”. Biometrika 63 591–592.
  • [35] Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Ann. Statist. 12 1151–1172.
  • [36] Rubin, D. B. (1987). Multiple Imputation for Nonresponse in Surveys. Wiley, New York.
  • [37] Schafer, J. L. (1997). Analysis of Incomplete Multivariate Data. Monographs on Statistics and Applied Probability 72. Chapman & Hall, London.
  • [38] Sterne, J. A., White, I. R., Carlin, J. B., Spratt, M., Royston, P., Kenward, M. G., Wood, A. M. and Carpenter, J. R. (2009). Multiple imputation for missing data in epidemiological and clinical research: Potential and pitfalls. Br. Med. J. 338 art. no. b2393.
  • [39] Tsiatis, A. A. (2006). Semiparametric Theory and Missing Data. Springer, New York.
  • [40] Tsou, T.-S. and Royall, R. M. (1995). Robust likelihoods. J. Amer. Statist. Assoc. 90 316–320.
  • [41] Wood, A. M., White, I. R., Hillsdon, M. and Carpenter, J. (2004). Comparison of imputation and modelling methods in the analysis of a physical activity trial with missing outcomes. Int. J. Epidemiol. 34 89–99.