The Annals of Applied Statistics

How strong is strong enough? Strengthening instruments through matching and weak instrument tests

Luke Keele and Jason W. Morgan

Full-text: Access denied (no subscription detected) We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text


In a natural experiment, treatment assignments are made through a haphazard process that is thought to be as-if random. In one form of the natural experiment, encouragement to accept treatment rather than treatments themselves are assigned in this haphazard process. This encouragement to accept treatment is often referred to as an instrument. Instruments can be characterized by different levels of strength depending on the amount of encouragement. Weak instruments that provide little encouragement may produce biased inferences, particularly when assignment of the instrument is not strictly randomized. A specialized matching algorithm can be used to strengthen instruments by selecting a subset of matched pairs where encouragement is strongest. We demonstrate how weak instrument tests can guide the matching process to ensure that the instrument has been sufficiently strengthened. Specifically, we combine a matching algorithm for strengthening instruments and weak instrument tests in the context of a study of whether turnout influences party vote share in US elections. It is thought that when turnout is higher, Democratic candidates will receive a higher vote share. Using excess rainfall as an instrument, we hope to observe an instance where unusually wet weather produces lower turnout in an as-if random fashion. Consistent with statistical theory, we find that strengthening the instrument reduces sensitivity to bias from an unobserved confounder.

Article information

Ann. Appl. Stat. Volume 10, Number 2 (2016), 1086-1106.

Received: February 2015
Revised: March 2016
First available in Project Euclid: 22 July 2016

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Causal inference instrumental variables matching weak instruments


Keele, Luke; Morgan, Jason W. How strong is strong enough? Strengthening instruments through matching and weak instrument tests. Ann. Appl. Stat. 10 (2016), no. 2, 1086--1106. doi:10.1214/16-AOAS932.

Export citation


  • Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444–455.
  • Angrist, J. D. and Pischke, J.-S. (2010). The credibility revolution in empirical economics: How better research design is taking the con out of econometrics. J. Econ. Perspect. 24 3–30.
  • Baiocchi, M., Small, D. S., Lorch, S. and Rosenbaum, P. R. (2010). Building a stronger instrument in an observational study of perinatal care for premature infants. J. Amer. Statist. Assoc. 105 1285–1296.
  • Baiocchi, M., Small, D. S., Yang, L., Polsky, D. and Groeneveld, P. W. (2012). Near/far matching: A study design approach to instrumental variables. Health Serv. Outcomes Res. Methodol. 12 237–253.
  • Bound, J., Jaeger, D. A. and Baker, R. M. (1995). Problems with intrustmental variables estimation when the correlation between the instruments and the endogenous explanatory variable is weak. J. Amer. Statist. Assoc. 90 443–450.
  • Brumback, B. A., Hernán, M. A., Haneuse, S. J. and Robins, J. M. (2004). Sensitivity analyses for unmeasured confounding assuming a marginal structural model for repeated measures. Stat. Med. 23 749–767.
  • Chamberlain, G. and Imbens, G. (2004). Random effects estimators with many instrumental variables. Econometrica 72 295–306.
  • Chao, J. C. and Swanson, N. R. (2005). Consistent estimation with a large number of weak instruments. Econometrica 73 1673–1692.
  • Cornfield, J., Haenszel, W., Hammond, E., Lilienfeld, A., Shimkin, M. and Wynder, E. (1959). Smoking and lung cancer: Recent evidence and a discussion of some questions. J. Natl. Cancer Inst. 22 173–203.
  • Dunning, T. (2012). Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge Univ. Press, Cambridge, UK.
  • Gomez, B. T., Hansford, T. G. and Krause, G. A. (2007). The republicans should pray for rain: Weather turnout, and voting in U.S. presidential elections. J. Polit. 69 649–663.
  • Hansford, T. G. and Gomez, B. T. (2010). Estimating the electoral effects of voter turnout. Am. Polit. Sci. Rev. 104 268–288.
  • Hodges, J. L. Jr. and Lehmann, E. L. (1963). Estimates of location based on rank tests. Ann. Math. Stat. 34 598–611.
  • Holland, P. W. (1988). Causal inference, path analysis, and recursive structural equation models. Sociol. Method. 18 449–484.
  • Imbens, G. W. and Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica 62 467–476.
  • Imbens, G. W. and Rosenbaum, P. R. (2005). Robust, accurate confidence intervals with a weak instrument: Quarter of birth and education. J. Roy. Statist. Soc. Ser. A 168 109–126.
  • Keele, L. J. and Minozzi, W. (2012). How much is Minnesota like Wisconsin? Assumptions and counterfactuals in causal inference with observational data. Polit. Anal. 21 193–216.
  • Keele, L., Titiunik, R. and Zubizarreta, J. R. (2015). Enhancing a geographic regression discontinuity design through matching to estimate the effect of ballot initiatives on voter turnout. J. Roy. Statist. Soc. Ser. A 178 223–239.
  • Lin, D. Y., Psaty, B. M. and Kronmal, R. A. (1998). Assessing the sensitivity of regression results to unmeasured confounders in observational studies. Biometrics 54 948–963.
  • Liu, W., Kuramoto, S. J. and Stuart, E. A. (2013). An introduction to sensitivity analysis for unobserved confounding in nonexperimental prevention research. Prev. Sci. 14 570–580.
  • Lorch, S. A., Baiocchi, M., Ahlberg, C. E. and Small, D. S. (2012). The differential impact of delivery hospital on the outcomes of premature infants. Pediatrics 130 270–278.
  • Lu, B., Zanutto, E., Hornik, R. and Rosenbaum, P. R. (2001). Matching with doses in an observational study of a media campaign against drug abuse. J. Amer. Statist. Assoc. 96 1245–1253.
  • McCandless, L. C., Gustafson, P. and Levy, A. (2007). Bayesian sensitivity analysis for unmeasured confounding in observational studies. Stat. Med. 26 2331–2347.
  • Nagler, J. (1991). The effect of registration laws and education on united-states voter turnout. Am. Polit. Sci. Rev. 85 1393–1405.
  • Robins, J. M., Rotnitzky, A. and Scharfstein, D. O. (2000). Sensitivity analysis for selection bias and unmeasured confounding in missing data and causal inference models. In Statistical Models in Epidemiology, the Environment, and Clinical Trials (Minneapolis, MN, 1997) (E. Halloran and D. Berry, eds.). IMA Vol. Math. Appl. 116 1–94. Springer, New York.
  • Rosenbaum, P. R. (1984). The consequences of adjusting for a concomitant variable that has been affected by the treatment. J. Roy. Statist. Soc. Ser. A 147 656–666.
  • Rosenbaum, P. R. (1996). Identification of causal effects using instrumental variables: Comment. J. Amer. Statist. Assoc. 91 465–468.
  • Rosenbaum, P. R. (1999). Using quantile averages in matched observational studies. J. R. Stat. Soc. Ser. C. Appl. Stat. 48 63–78.
  • Rosenbaum, P. R. (2002a). Observational Studies, 2nd ed. Springer, New York.
  • Rosenbaum, P. R. (2002b). Covariance adjustment in randomized experiments and observational studies. Statist. Sci. 17 286–327.
  • Rosenbaum, P. R. (2005). Heterogeneity and causality: Unit heterogeneity and design sensitivity in observational studies. Amer. Statist. 59 147–152.
  • Rosenbaum, P. R. (2007). Sensitivity analysis for $m$-estimates, tests, and confidence intervals in matched observational studies. Biometrics 63 456–464.
  • Rosenbaum, P. R. (2010). Design of Observational Studies. Springer, New York.
  • Rosenzweig, M. R. and Wolpin, K. I. (2000). Natural “natural experiments” in economics. J. Econ. Lit. 38 827–874.
  • Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 6 688–701.
  • Rubin, D. B. (1986). Which ifs have causal answers. J. Amer. Statist. Assoc. 81 961–962.
  • Small, D. S. (2007). Sensitivity analysis for instrumental variables regression with overidentifying restrictions. J. Amer. Statist. Assoc. 102 1049–1058.
  • Small, D. S. and Rosenbaum, P. R. (2008). War and wages: The strength of instrumental variables and their sensitivity to unobserved biases. J. Amer. Statist. Assoc. 103 924–933.
  • Sobel, M. E. (2006). What do randomized studies of housing mobility demonstrate?: Causal inference in the face of interference. J. Amer. Statist. Assoc. 101 1398–1407.
  • Splawa-Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 465–472.
  • Staiger, D. and Stock, J. H. (1997). Instrumental variables regression with weak instruments. Econometrica 65 557–586.
  • Stock, J. H. and Yogo, M. (2005). Testing for weak instruments in linear IV regression. In Identification and Inference in Econometric Models: Essays in Honor of Thomas J. Rothenberg (D. W. K. Andrews and J. H. Stock, eds.) Cambridge Univ. Press, Cambridge.
  • Wald, A. (1940). The fitting of straight lines if both variables are subject to error. Ann. Math. Stat. 11 285–300.
  • Wolfinger, R. E. and Rosenstone, S. J. (1980). Who Votes? Yale Univ. Press, New Haven.
  • Zigler, C. M., Dominici, F. and Wang, Y. (2012). Estimating causal effects of air quality regulations using principal stratification for spatially correlated multivariate intermediate outcomes. Biostatistics 13 289–302.
  • Zubizarreta, J. R., Small, D. S. and Rosenbaum, P. R. (2014). Isolation in the construction of natural experiments. Ann. Appl. Stat. 8 2096–2121.
  • Zubizarreta, J. R., Small, D. S., Goyal, N. K., Lorch, S. and Rosenbaum, P. R. (2013). Stronger instruments via integer programming in an observational study of late preterm birth outcomes. Ann. Appl. Stat. 7 25–50.