Annals of Applied Statistics

Isolation in the construction of natural experiments

José R. Zubizarreta, Dylan S. Small, and Paul R. Rosenbaum

Full-text: Open access


A natural experiment is a type of observational study in which treatment assignment, though not randomized by the investigator, is plausibly close to random. A process that assigns treatments in a highly nonrandom, inequitable manner may, in rare and brief moments, assign aspects of treatments at random or nearly so. Isolating those moments and aspects may extract a natural experiment from a setting in which treatment assignment is otherwise quite biased, far from random. Isolation is a tool that focuses on those rare, brief instances, extracting a small natural experiment from otherwise useless data. We discuss the theory behind isolation and illustrate its use in a reanalysis of a well-known study of the effects of fertility on workforce participation. Whether a woman becomes pregnant at a certain moment in her life and whether she brings that pregnancy to term may reflect her aspirations for family, education and career, the degree of control she exerts over her fertility, and the quality of her relationship with the father; moreover, these aspirations and relationships are unlikely to be recorded with precision in surveys and censuses, and they may confound studies of workforce participation. However, given that a women is pregnant and will bring the pregnancy to term, whether she will have twins or a single child is, to a large extent, simply luck. Given that a woman is pregnant at a certain moment, the differential comparison of two types of pregnancies on workforce participation, twins or a single child, may be close to randomized, not biased by unmeasured aspirations. In this comparison, we find in our case study that mothers of twins had more children but only slightly reduced workforce participation, approximately 5% less time at work for an additional child.

Article information

Ann. Appl. Stat., Volume 8, Number 4 (2014), 2096-2121.

First available in Project Euclid: 19 December 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Differential effect generic bias risk-set matching sensitivity analysis


Zubizarreta, José R.; Small, Dylan S.; Rosenbaum, Paul R. Isolation in the construction of natural experiments. Ann. Appl. Stat. 8 (2014), no. 4, 2096--2121. doi:10.1214/14-AOAS770.

Export citation


  • Angrist, J. D. and Evans, W. N. (1998). Children and their parent’s labor supply: Evidence from exogenous variation in family size. Amer. Econ. Rev. 88 450–477.
  • Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444–455.
  • Anthony, J. C., Breitner, J. C., Zandi, P. P., Meyer, M. R., Jurasova, I., Norton, M. C. and Stone, S. V. (2000). Reduced prevalence of AD in users of NSAIDs and H2 receptor antagonists: The cache county study. Neurology 54 2066–2071.
  • Apel, R., Blokland, A. A. J., Nieuwbeerta, P. and van Schellen, M. (2010). The impact of imprisonment on marriage and divorce: A risk-set matching approach. J. Quant. Criminol. 26 269–300.
  • Arpino, B. and Aassve, A. (2013). Estimating the causal effect of fertility on economic wellbeing: Data requirements, identifying assumptions and estimation methods. Empir. Econ. 44 355–385.
  • Baiocchi, M., Small, D. S., Lorch, S. and Rosenbaum, P. R. (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. J. Amer. Statist. Assoc. 105 1285–1296.
  • Bound, J., Jaeger, D. A. and Baker, R. M. (1995). Problems with instrumental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J. Amer. Statist. Assoc. 90 443–450.
  • Brookhart, M. A., Wang, P. S., Solomon, D. H. and Schneeweiss, S. (2006). Evaluating short-term drug effects using a physician-specific prescribing preference as an instrumental variable. Epidemiology 17 268–275.
  • Campbell, D. T. (1986). Relabeling internal and external validity for applied social scientists. In Advances in Quasi-Experimental Design and Analysis (W. M. K. Trochim, ed.) 67–77. Jossey-Bass, San Francisco, CA.
  • Cornfield, J., Haenszel, W., Hammond, E. C., Lilienfeld, A. M., Shimkin, M. B. and Wynder, E. L. (1959). Smoking and lung cancer: Recent evidence and a discussion of some questions. J. Natl. Cancer Inst. 22 173–203.
  • Cox, D. R. (1972). Regression models and life-tables. J. R. Stat. Soc. Ser. B Stat. Methodol. 34 187–220.
  • Diprete, T. A. and Gangl, M. (2004). Assessing bias in the estimation of causal effects. Sociolog. Method. 34 271–310.
  • Egleston, B. L., Scharfstein, D. O. and MacKenzie, E. (2009). On estimation of the survivor average causal effect in observational studies when important confounders are missing due to death. Biometrics 65 497–504.
  • Gastwirth, J. L. (1992). Methods for assessing the sensitivity of statistical comparisons used in title VII cases to omitted variables. Jurimetrics 33 19–34.
  • Gibbons, R. D., Amatya, A. K., Brown, C. H., Hur, K., Marcus, S. M., Bhaumik, D. K. and Mann, J. (2010). Post-approval drug safety surveillance. Ann. Rev. Pub. Health 31 419–437.
  • Hansen, B. B. (2007). Optmatch. R News 7 18–24.
  • Holland, P. W. H. (1988). Causal inference, path analysis, and recursive structural equations models. Sociolog. Method. 18 449–484.
  • Hosman, C. A., Hansen, B. B. and Holland, P. W. (2010). The sensitivity of linear regression coefficients’ confidence limits to the omission of a confounder. Ann. Appl. Stat. 4 849–870.
  • Hsu, J. Y. and Small, D. S. (2013). Calibrating sensitivity analyses to observed covariates in observational studies. Biometrics 69 803–811.
  • Imai, K., Keele, L., Tingley, D. and Yamamoto, T. (2011). Unpacking the black box of causality: Learning about causal mechanisms from experimental and observational studies. Amer. Polit. Sci. Rev. 105 765–789.
  • Imbens, G. W. and Rosenbaum, P. R. (2005). Robust, accurate confidence intervals with a weak instrument: Quarter of birth and education. J. Roy. Statist. Soc. Ser. A 168 109–126.
  • Kennedy, E. H., Taylor, J. M. G., Schaubel, D. E. and Williams, S. (2010). The effect of salvage therapy on survival in a longitudinal study with treatment by indication. Stat. Med. 29 2569–2580.
  • Li, Y. P., Propert, K. J. and Rosenbaum, P. R. (2001). Balanced risk set matching. J. Amer. Statist. Assoc. 96 870–882.
  • Lin, D. Y., Psaty, B. M. and Kronmal, R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54 948–963.
  • Liu, W., Kuramoto, J. and Stuart, E. (2013). Sensitivity analysis for unobserved confounding in nonexperimental prevention research. Prev. Sci. 14 570–580.
  • Lu, B. (2005). Propensity score matching with time-dependent covariates. Biometrics 61 721–728.
  • Lu, B., Greevy, R., Xu, X. and Beck, C. (2011). Optimal nonbipartite matching and its statistical applications. Amer. Statist. 65 21–30.
  • Marcus, S. M. (1997). Using omitted variable bias to assess uncertainty in the estimation of an AIDS education treatment effect. J. Ed. Behav. Statist. 22 193–201.
  • Marcus, S. M., Siddique, J., Ten Have, T. R., Gibbons, R. D., Stuart, E. A. and Normand, S. L. T. (2008). Balancing treatment comparisons in longitudinal studies. Psychiatr. Ann. 38 805–811.
  • McCandless, L. C., Gustafson, P. and Levy, A. (2007). Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat. Med. 26 2331–2347.
  • Meyer, B. D. (1995). Natural and quasi-experiments in economics. J. Bus. Econom. Statist. 13 151–161.
  • Murray, J., Loeber, R. and Pardini, D. (2012). Parental involvement in the criminal justice system and the development of youth theft, marijuana use, depression, and poor academic performance. Criminology 50 255–302.
  • Nagin, D. S. and Snodgrass, G. M. (2013). The effect of incarceration on re-offending: Evidence from a natural experiment in Pennsylvania. J. Quant. Criminol. 29 601–642.
  • Neyman, J. (1923). On the application of probability theory to agricultural experiments. Statist. Sci. 5 463–464.
  • Nieuwbeerta, P., Nagin, D. S. and Blokland, A. A. J. (2009). Assessing the impack of first-time imprisonment on offender’s subsequent criminal career development: A matched samples comparison. J. Quant. Criminol. 25 227–257.
  • Robins, J. M., Rotnitzky, A. and Scharfstein, D. O. (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical Models in Epidemiology, the Environment, and Clinical Trials (Minneapolis, MN, 1997) (E. Halloran and D. Berry, eds.). IMA Vol. Math. Appl. 116 1–94. Springer, New York.
  • Rosenbaum, P. R. (1984). The consequences of adjustment for a concomitant variable that has been affected by the treatment. J. Roy. Statist. Soc. Ser. A 147 656–666.
  • Rosenbaum, P. R. (1987). Sensitivity analysis for certain permutation tests in matched observational studies. Biometrika 74 13–26.
  • Rosenbaum, P. R. (1996). Comment. J. Amer. Statist. Assoc. 91 465–468.
  • Rosenbaum, P. R. (2006). Differential effects and generic biases in observational studies. Biometrika 93 573–586.
  • Rosenbaum, P. R. (2007). Sensitivity analysis for $m$-estimates, tests, and confidence intervals in matched observational studies. Biometrics 63 456–464.
  • Rosenbaum, P. R. (2010). Design of Observational Studies. Springer, New York.
  • Rosenbaum, P. R. (2013a). Using differential comparisons in observational studies. Chance 26 18–25.
  • Rosenbaum, P. R. (2013b). Impact of multiple matched controls on design sensitivity in observational studies. Biometrics 69 118–127.
  • Rosenbaum, P. R. and Silber, J. H. (2009). Amplification of sensitivity analysis in matched observational studies. J. Amer. Statist. Assoc. 104 1398–1405.
  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psych. 66 688–701.
  • Rutter, M. (2007). Proceeding from observed correlation to causal inference: The use of natural experiments. Perspect. Psychol. Sci. 2 377–395.
  • Ryan, P. B., Madigan, D., Stang, P. E., Overhage, J. M., Racoosin, J. A. and Hartzema, A. G. (2012). Empirical assessment of methods for risk identification in healthcare data: Results from the experiments of the observational medical outcomes partnership. Stat. Med. 31 4401–4415.
  • Sekhon, J. S. and Titiunik, R. (2012). When natural experiments are neither natural nor experiments. Amer. Polit. Sci. Rev. 106 35–57.
  • Small, D. S. (2007). Sensitivity analysis for instrumental variables regression with overidentifying restrictions. J. Amer. Statist. Assoc. 102 1049–1058.
  • Small, D. S. and Rosenbaum, P. R. (2008). War and wages: The strength of instrumental variables and their sensitivity to unobserved biases. J. Amer. Statist. Assoc. 103 924–933.
  • Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statist. Sci. 25 1–21.
  • Susser, M. (1973). Causal Thinking in the Health Sciences. Oxford, New York.
  • Susser, M. (1981). Prenatal nutrition, birthweight, and psychological development: An overview of experiments, quasi-experiments, and natural experiments in the past decade. Amer. J. Clin. Nutrition 34 784–803.
  • Vandenbroucke, J. P. (2004). When are observational studies as credible as randomised trials? Lancet 363 1728–1731.
  • van der Laan, M. J. and Robins, J. M. (2003). Unified Methods for Censored Longitudinal Data and Causality. Springer Series in Statistics. Springer, New York.
  • Wang, L. and Krieger, A. M. (2006). Causal conclusions are most sensitive to unobserved binary covariates. Stat. Med. 25 2257–2271.
  • Wildeman, C., Schnittker, J. and Turney, K. (2012). Despair by association? Amer. Sociol. Rev. 77 216–243.
  • Yu, B. B. and Gastwirth, J. L. (2005). Sensitivity analysis for trend tests: Application to the risk of radiation exposure. Biostatistics 6 201–209.
  • Zubizarreta, J. R. (2012). Using mixed integer programming for matching in an observational study of kidney failure after surgery. J. Amer. Statist. Assoc. 107 1360–1371.
  • Zubizarreta, J. R., Small, D. S., Goyal, N. K., Lorch, S. and Rosenbaum, P. R. (2013). Stronger instruments via integer programming in an observational study of late preterm birth outcomes. Ann. Appl. Stat. 7 25–50.