The Annals of Applied Statistics

For objective causal inference, design trumps analysis

Donald B. Rubin

Full-text: Open access


For obtaining causal inferences that are objective, and therefore have the best chance of revealing scientific truths, carefully designed and executed randomized experiments are generally considered to be the gold standard. Observational studies, in contrast, are generally fraught with problems that compromise any claim for objectivity of the resulting causal inferences. The thesis here is that observational studies have to be carefully designed to approximate randomized experiments, in particular, without examining any final outcome data. Often a candidate data set will have to be rejected as inadequate because of lack of data on key covariates, or because of lack of overlap in the distributions of key covariates between treatment and control groups, often revealed by careful propensity score analyses. Sometimes the template for the approximating randomized experiment will have to be altered, and the use of principal stratification can be helpful in doing this. These issues are discussed and illustrated using the framework of potential outcomes to define causal effects, which greatly clarifies critical issues.

Article information

Ann. Appl. Stat. Volume 2, Number 3 (2008), 808-840.

First available in Project Euclid: 13 October 2008

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Average causal effect causal effects complier average causal effect instrumental variables noncompliance observational studies propensity scores randomized experiments Rubin Causal Model


Rubin, Donald B. For objective causal inference, design trumps analysis. Ann. Appl. Stat. 2 (2008), no. 3, 808--840. doi:10.1214/08-AOAS187.

Export citation


  • Ahmed, A., Husain, A., Love, T., Gambassi, G., Dell’Italia, L., Francis, G., Gheorghiade, M., Allman, R., Meleth, S. and Bourge, R. (2006). Heart failure, chronic diuretic use, and increase in mortality and hospitalization: An observational study using propensity score methods. Eur. Heart J. 27 1431–1439.
  • Angrist, J., Imbens, G. and Rubin, D. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444–472.
  • Barnard, J., Frangakis, C., Hill, J. and Rubin, D. (2003). Principal stratification approach to broken randomized experiments: A case study of school choice vouchers in New York city. J. Amer. Statist. Assoc. 98 299–323.
  • Blalock, H. (1964). Causal Inference in Nonexperimental Research. Univ. North Carolina Press, Chapel Hill.
  • Campbell, D. and Stanley, J. (1963). Experimental and quasi-experimental designs for research and teaching. In Handbook of Research on Teaching (N. L. Gage, ed.). Rand McNally, Chicago.
  • Cochran, W. (1965). The planning of observational studies of human populations. J. Roy. Statist. Soc. A 128 234–265.
  • Cochran, W. (1968). The effectiveness of adjustment by subclassification in removing bias in observational studies. Biometrics 2 295–313.
  • Cochran, W. (1983). Planning and Analysis of Observational Studies. Wiley, New York.
  • Cochran, W. and Cox, G. (1950). Experimental Designs. Wiley, New York.
  • Cook, T. and Campbell, D. (1979). Quasi-Experimentation: Design and Analysis for Field Settings. Rand McNally, Chicago.
  • Cox, D. (1958). The Planning of Experiments. Wiley, New York.
  • D’Agostino, R. Jr. and D’Agostino, R. Sr. (2007). Estimating treatment effects using observational data. J. Amer. Med. Assoc. 297 314–316.
  • Dorn, H. (1953). Philosophy of inference from retrospective studies. Amer. J. Publ. Health 43 677–683.
  • Fisher, R. (1925). Statistical Methods for Research Workers. Oliver and Boyd, Edinburgh.
  • Fisher, R. (1935). Design of Experiments. Oliver and Boyd, Edinburgh.
  • Frangakis, C. and Rubin, D. (2002). Principal stratification in causal inference. Biometrics 58 21–29.
  • Haavelmo, T. (1944). The probability approach in econometrics. Econometrica 15 413–419.
  • Holland, P. (1986). Statistics and causal inference. J. Amer. Statist. Assoc. 81 945–960.
  • Holland, P. (1988). Causal inference, path analysis, and recursive structural equations models. Sociological Methodology 18 449–484.
  • Holland, P. and Rubin, D. (1983). On Lord’s paradox. Principles of Modern Psychological Measurement: A Festschrift for Frederick Lord 3–25. Erlbaum, New Jersey.
  • Imbens, G. and Rubin, D. (1997). Bayesian inference for causal effects in randomized experiments with noncompliance. Ann. Statist. 25 305–327.
  • Imbens, G. and Rubin, D. (2008a). Rubin causal model. The New Palgrave Dictionary of Economics (S. Durlauf and C. Blume, eds.), 2nd ed. Palgrave McMillan, New York.
  • Imbens, G. and Rubin, D. (2008b). Causal Inference in Statistics, and in the Social and Biomedical Sciences. Cambridge Univ. Press, New York. To appear.
  • Jin, H. and Rubin, D. (2008). Principal stratification for causal inference with extended partial compliance: Application to Efron–Feldman data. J. Amer. Statist. Assoc. 103 101–111.
  • Kempthorne, O. (1952). The Design and Analysis of Experiments. Wiley, New York.
  • Langenskold, S. and Rubin, D. (2008). Outcome-free design of observational studies with application to investigating peer effects on college freshman smoking behaviors. In Les Annales d’Economie et de Statistique. To appear.
  • Kenny, D. A. (1979). Correlation and Causation. Wiley, New York.
  • Lilienfeld, A. and Lilienfeld, D. (1976). Foundations of Epidemiology. Oxford Univ. Press, New York.
  • Maddala, G. (1977). Econometrics. McGraw-Hill, New York.
  • Morgan, S. L. and Winship, C. (2007). Counterfactuals and Causal Inference: Methods and Principles for Social Research. Cambridge Univ. Press, Cambridge.
  • Neyman, J. (1923). On the application of probability theory to agricultural experiments: Essay on principles, Section 9. Translated in Statist. Sci. 5 465–480.
  • Neyman, J. (1990). On the application of probability theory to agricultural experiments: Essay on principles, Section 9. Ann. Agric. Sci. 1923. Translated in Statist. Sci. 5 465–472.
  • Reinisch, L., Sanders, S., Mortensen, E. and Rubin, D. (1995). In utero exposure to phenobarbital and intelligence deficits in adult men. J. Amer. Med. Assoc. 274 1518–1525.
  • Rosenbaum, P. and Rubin, D. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika 70 41–55.
  • Rosenbaum, P. and Rubin, D. (1985). Constructing a control group using multivariate matched sampling incorporating the propensity core. Amer. Statist. 39 33–38.
  • Rothman, K. J. (1986). Modern Epidemiology. Little, Brown and Company, Boston.
  • Roy, A. (1951). Some thoughts on the distribution of earnings. Oxford Economic Papers 3 135–146.
  • Rubin, D. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66 688–701.
  • Rubin, D. (1975). Bayesian inference for causality: The importance of randomization. In The Proceedings of the Social Statistics Section of the American Statistical Association 233–239. American Statistical Association, Alexandria, VA.
  • Rubin, D. (1976a). Inference and missing data. Biometrika 63 581–592. With discussion and reply.
  • Rubin, D. (1976b). Multivariate matching methods that are equal percent bias reducing, II: Maximums on bias reduction for fixed sample sizes. Biometrics 32 121–132.
  • Rubin, D. (1977). Assignment to treatment group on the basis of a covariate. J. Educ. Statist. 2 1–26.
  • Rubin, D. (1978). Bayesian inference for causal effects: The role of randomization. Ann. Statist. 6 34–58.
  • Rubin, D. (1979a). Discussion of “Conditional independence in statistical theory” by A.P. Dawid. J. Roy. Statist. Soc. Ser. B 41 27–28.
  • Rubin, D. (1979b). Using multivariate matched sampling and regression adjustment to control bias in observational studies. J. Amer. Statist. Assoc. 74 318–328.
  • Rubin, D. (1980). Discussion of “Randomization analysis of experimental data in the Fisher randomization test” by Basu. J. Amer. Statist. Assoc. 75 591–593.
  • Rubin, D. (1984). William G. Cochran’s contributions to the design, analysis, and evaluation of observational studies. In W. G. Cochran’s Impact on Statistics (P. S. R. S. Rao and J. Sedransk, eds.) 37–69. Wiley, New York.
  • Rubin, D. (1990a). Neyman (1923) and causal inference in experiments and observational studies. Statist. Sci. 5 472–480.
  • Rubin, D. (1990b). Formal modes of statistical inference for causal effects. J. Statist. Plann. Inference 25 279–292.
  • Rubin, D. (1997). Estimating causal effects from large data sets using propensity scores. Ann. Internal Med. 127 757–763.
  • Rubin, D. (2002). Using propensity scores to help design observational studies: Application to the tobacco litigation. Health Serv. and Outcomes Res. Methodol. 2 169–188.
  • Rubin, D. (2005). Causal inference using potential outcomes: Design, modeling, decisions. 2004 Fisher lecture. J. Amer. Statist. Assoc. 100 322–331.
  • Rubin, D. (2006). Matched Sampling for Causal Effects. Cambridge Univ. Press, New York.
  • Rubin, D. (2007). The design versus the analysis of observational studies for causal effects: Parallels with the design of randomized trials. Stat. Med. 26 20–30.
  • Rubin, D. (2008). Statistical inference for causal effects, with emphasis on applications in epidemiology and medical statistics. II. In Handbook of Statisics: Epidemiology and Medical Statistics (C. R. Rao, J. P. Miller and D. C. Rao, eds.). Elsevier, The Netherlands.
  • Rubin, D. and Thomas, N. (1992). Characterizing the effect of matching using linear propensity score methods with normal covariates. Biometrika 79 797–809.
  • Rubin, D. and Thomas, N. (2000). Combining propensity score matching with additional adjustments for prognostic covariates. J. Amer. Statist. Assoc. 95 573–585.
  • Rubin, D., Wang, X., Yin, L. and Zell, E. (2008). Bayesian causal inference: Approaches to estimating the effect of treating hospital type on cancer survival in Sweden using principal stratification. In Handbook of Applied Bayesian Analysis (T. O’Hagan and M. West, eds.). Oxford Univ. Press, Oxford.
  • Rubin, D. and Waterman, R. (2006). Estimating causal effects of marketing interventions using propensity score methodology. Statist. Sci. 21 206–222.
  • Shadish, W. R., Cook, T. D. and Campbell, D. T. (2002). Experimental and Quasi-Experimental Designs for Generalized Causal Inference. Houghton Mifflin Company, Boston.
  • Zell, E., Kuwanda, M. Rubin, D., Cutland, C., Patel, R., Velaphi S., Madhi, S. and Schrag, S. (2007). Conducting and analyzing a single-blind clinical trial in a developing country: Prevention of perinatal sepsis, soweto, South Africa. In Proceedings of the International Statistical Institute (CD-ROM).