The Annals of Applied Statistics

Split-door criterion: Identification of causal effects through auxiliary outcomes

Amit Sharma, Jake M. Hofman, and Duncan J. Watts

Full-text: Open access


We present a method for estimating causal effects in time series data when fine-grained information about the outcome of interest is available. Specifically, we examine what we call the split-door setting, where the outcome variable can be split into two parts: one that is potentially affected by the cause being studied and another that is independent of it, with both parts sharing the same (unobserved) confounders. We show that under these conditions, the problem of identification reduces to that of testing for independence among observed variables, and propose a method that uses this approach to automatically find subsets of the data that are causally identified. We demonstrate the method by estimating the causal impact of Amazon’s recommender system on traffic to product pages, finding thousands of examples within the dataset that satisfy the split-door criterion. Unlike past studies based on natural experiments that were limited to a single product category, our method applies to a large and representative sample of products viewed on the site. In line with previous work, we find that the widely-used click-through rate (CTR) metric overestimates the causal impact of recommender systems; depending on the product category, we estimate that 50–80% of the traffic attributed to recommender systems would have happened even without any recommendations. We conclude with guidelines for using the split-door criterion as well as a discussion of other contexts where the method can be applied.

Article information

Ann. Appl. Stat., Volume 12, Number 4 (2018), 2699-2733.

Received: January 2017
Revised: April 2018
First available in Project Euclid: 13 November 2018

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Causal inference data mining causal graphical model natural experiment recommendation systems


Sharma, Amit; Hofman, Jake M.; Watts, Duncan J. Split-door criterion: Identification of causal effects through auxiliary outcomes. Ann. Appl. Stat. 12 (2018), no. 4, 2699--2733. doi:10.1214/18-AOAS1179.

Export citation


  • Agresti, A. (1992). A survey of exact inference for contingency tables. Statist. Sci. 7 131–177.
  • Agresti, A. (2001). Exact inference for categorical data: Recent advances and continuing controversies. Stat. Med. 20 2709–2722.
  • Angrist, J. D., Imbens, G. W. and Rubin, D. B. (1996). Identification of causal effects using instrumental variables. J. Amer. Statist. Assoc. 91 444–455.
  • Carmi, E., Oestreicher-Singer, G. and Sundararajan, A. (2012). Is Oprah contagious? Identifying demand spillovers in online networks. NET Institute Working Paper 10-18.
  • Carnegie, N. B., Harada, M. and Hill, J. L. (2016). Assessing sensitivity to unmeasured confounding using a simulated potential confounder. Journal of Research on Educational Effectiveness 9 395–420.
  • Cattaneo, M. D., Frandsen, B. R. and Titiunik, R. (2015). Randomization inference in the regression discontinuity design: An application to party advantages in the US Senate. Journal of Causal Inference 3 1–24.
  • Cattaneo, M. D., Titiunik, R. and Vazquez-Bare, G. (2017). Comparing inference approaches for RD designs: A reexamination of the effect of head start on child mortality. Journal of Policy Analysis and Management 36 643–681.
  • de Siqueira Santos, S., Takahashi, D. Y., Nakata, A. and Fujita, A. (2014). A comparative study of statistical methods used to identify dependencies between gene expression signals. Brief. Bioinform. 15 906–918.
  • Delongchamp, R. R., Bowyer, J. F., Chen, J. J. and Kodell, R. L. (2004). Multiple-testing strategy for analyzing cDNA array data on gene expression. Biometrics 60 774–782.
  • Dunning, T. (2012). Natural Experiments in the Social Sciences: A Design-Based Approach. Cambridge Univ. Press, Cambridge.
  • Farcomeni, A. (2008). A review of modern multiple hypothesis testing, with particular attention to the false discovery proportion. Stat. Methods Med. Res. 17 347–388.
  • Fiske, S. T. and Hauser, R. M. (2014). Protecting human research participants in the age of big data. Proc. Natl. Acad. Sci. USA 111 13675–13676.
  • Flaxman, S., Goel, S. and Rao, J. M. (2016). Filter bubbles, echo chambers, and online news consumption. Public Opin. Q. 80 298–320.
  • Grau, J. (2009). Personalized product recommendations: Predicting shoppers’ needs. EMarketer.
  • Grosse-Wentrup, M., Janzing, D., Siegel, M. and Schölkopf, B. (2016). Identification of causal relations in neuroimaging data with latent confounders: An instrumental variable approach. NeuroImage 125 825–833.
  • Harding, D. J. (2009). Collateral consequences of violence in disadvantaged neighborhoods. Soc. Forces 88 757–784.
  • Imbens, G. W. (2010). Better LATE than nothing. J. Econ. Lit. 48.
  • Imbens, G. W. and Rubin, D. B. (2015). Causal Inference—For Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge Univ. Press, New York.
  • Jensen, D. D., Fast, A. S., Taylor, B. J. and Maier, M. E. (2008). Automatic identification of quasi-experimental designs for discovering causal knowledge. In Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining 372–380.
  • Kang, H., Zhang, A., Cai, T. T. and Small, D. S. (2016). Instrumental variables estimation with some invalid instruments and its application to Mendelian randomization. J. Amer. Statist. Assoc. 111 132–144.
  • Lewis, R. A., Rao, J. M. and Reiley, D. H. (2011). Here, there, and everywhere: Correlated online behaviors can lead to overestimates of the effects of advertising. In Proceedings of the 20th International Conference on World Wide Web 157–166. ACM, New York.
  • Liang, K. and Nettleton, D. (2012). Adaptive and dynamic adaptive procedures for false discovery rate control and estimation. J. R. Stat. Soc. Ser. B. Stat. Methodol. 74 163–182.
  • Lydersen, S., Pradhan, V., Senchaudhuri, P. and Laake, P. (2007). Choice of test for association in small sample unordered $r\times c$ tables. Stat. Med. 26 4328–4343.
  • Mealli, F. and Pacini, B. (2013). Using secondary outcomes to sharpen inference in randomized experiments with noncompliance. J. Amer. Statist. Assoc. 108 1120–1131.
  • Morgan, S. L. and Winship, C. (2014). Counterfactuals and Causal Inference. Cambridge Univ. Press, Cambridge.
  • Mulpuru, S. (2006). What you need to know about third-party recommendation engines. Forrester Research.
  • Nettleton, D., Hwang, J. T. G., Caldo, R. A. and Wise, R. P. (2006). Estimating the number of true null hypotheses from a histogram of p values. J. Agric. Biol. Environ. Stat. 11 337.
  • Paninski, L. (2003). Estimation of entropy and mutual information. Neural Comput. 15 1191–1253.
  • Pearl, J. (2009). Causality: Models, Reasoning, and Inference, 2nd ed. Cambridge Univ. Press, Cambridge.
  • Pethel, S. D. and Hahs, D. W. (2014). Exact test of independence using mutual information. Entropy 16 2839–2849.
  • Phan, T. Q. and Airoldi, E. M. (2015). A natural experiment of social network formation and dynamics. Proc. Natl. Acad. Sci. USA 112 6595–6600.
  • Ricci, F., Rokach, L. and Shapira, B. (2011). Introduction to Recommender Systems Handbook. Springer, Berlin.
  • Rosenbaum, P. R. (2010). Design of Observational Studies. Springer, New York.
  • Rosenzweig, M. R. and Wolpin, K. I. (2000). Natural “natural experiments” in economics. J. Econ. Lit. 38 827–874.
  • Rubin, D. B. (2006). Matched Sampling for Causal Effects. Cambridge Univ. Press, Cambridge.
  • Sharma, A., Hofman, J. M. and Watts, D. J. (2015). Estimating the causal impact of recommendation systems from observational data. In Proceedings of the 16th ACM Conference on Economics and Computation 453–470.
  • Sharma, A., Hofman, J. M and Watts, D. J (2018). Supplement to “Split-door criterion: Identification of causal effects through auxiliary outcomes.” DOI:10.1214/18-AOAS1179SUPP.
  • Spirtes, P., Glymour, C. and Scheines, R. (2000). Causation, Prediction, and Search, 2nd ed. MIT Press, Cambridge, MA.
  • Steuer, R., Kurths, J., Daub, C. O., Weise, J. and Selbig, J. (2002). The mutual information: Detecting and evaluating dependencies between variables. Bioinformatics 18 S231–S240.
  • Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B. Stat. Methodol. 64 479–498.
  • Storey, J. D. and Tibshirani, R. (2003). SAM thresholding and false discovery rates for detecting differential gene expression in DNA microarrays. In The Analysis of Gene Expression Data. 272–290. Springer, New York.
  • Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statist. Sci. 25 1–21.
  • Székely, G. J., Rizzo, M. L. and Bakirov, N. K. (2007). Measuring and testing dependence by correlation of distances. Ann. Statist. 35 2769–2794.
  • VanderWeele, T. J. and Arah, O. A. (2011). Bias formulas for sensitivity analysis of unmeasured confounding for general outcomes, treatments, and confounders. Epidemiology 22 42–52.

Supplemental materials