## Bayesian Analysis

### Posterior Predictive p-Values with Fisher Randomization Tests in Noncompliance Settings: Test Statistics vs Discrepancy Measures

#### Abstract

In randomized experiments with noncompliance, one might wish to focus on compliers rather than on the overall sample. In this vein, Rubin (1998) argued that testing for the complier average causal effect and averaging permutation-based $p$-values over the posterior distribution of the compliance types could increase power as compared to general intent-to-treat tests. The general scheme is a repeated two-step process: impute missing compliance types and conduct a permutation test with the completed data. In this paper, we explore this idea further, comparing the use of discrepancy measures—which depend on unknown but imputed parameters—to classical test statistics and contrasting different approaches for imputing the unknown compliance types. We also examine consequences of model misspecification in the imputation step, and discuss to what extent this additional modeling undercuts the advantage of permutation tests being model independent. We find that, especially for discrepancy measures, modeling choices can impact both power and validity. In particular, imputing missing compliance types under the null can radically reduce power, but not doing so can jeopardize validity. Fortunately, using covariates predictive of compliance type in the imputation can mitigate these results. We also compare this overall approach to Bayesian model-based tests, that is, tests that are directly derived from posterior credible intervals, under both correct and incorrect model specification.

#### Article information

Source
Bayesian Anal., Volume 13, Number 3 (2018), 681-701.

Dates
First available in Project Euclid: 17 August 2017

https://projecteuclid.org/euclid.ba/1502935324

Digital Object Identifier
doi:10.1214/17-BA1062

Mathematical Reviews number (MathSciNet)
MR3807862

#### Citation

Forastiere, Laura; Mealli, Fabrizia; Miratrix, Luke. Posterior Predictive p -Values with Fisher Randomization Tests in Noncompliance Settings: Test Statistics vs Discrepancy Measures. Bayesian Anal. 13 (2018), no. 3, 681--701. doi:10.1214/17-BA1062. https://projecteuclid.org/euclid.ba/1502935324

#### References

• Angrist, J. D., Imbens,G. W., & Rubin, D. B. (1996). Identification of causal effects using instrumental variables (with discussion). Journal of the American Statistical Association, 91, 444–472.
• Fisher, R. A. (1925). Statistical Methods for Research Workers. 1st ed. Oliver and Boyd, Edinburgh.
• Fisher, R. A. (1925). The arrangement of field experiments. Journal of the Ministry of Agriculture of Great Britain, 33, 503–513.
• Fisher, R. A. (1925). The design of experiments. Edinburgh: Oliver and Boyd.
• Frangakis, C. E. & Rubin, D. B. (2002). Principal stratification in causal inference. Biometrics, 58, 21–29.
• Gelman, A., Meng, X. L., and Stern, H. S. (1996). Posterior predictive assessment of model fitness via realized discrepancies (with discussion). Statistica Sinica, 6, 733–807.
• Gelman, A. (2013). Two simple examples for understanding posterior $p$-values whose distributions are far from uniform. Electronic Journal of Statistics, 7, 2595–2602.
• Guttman, I. (1967). The use of the concept of a future observation in goodness-of-fit problems. Journal of the Royal Statistical Society B, 29(1) 83–100.
• Imbens, G. W. & Angrist, J. D. (1994). Identification and estimation of local average treatment effects. Econometrica, 62, 467–476.
• Meng, X. L. (1994a). Posterior predictive $p$-values. Annals of Statistics, 22, 1142–1160.
• Meng, X. L. (1994b). Multiple-imputation inferences under uncongeniality. Statistical Science, 4, 538–573.
• Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Roczniki Nauk Rolniczych Tom X [in Polish]; translated in Statistical Science, 5, 465–480.
• Neyman, J. (1934). On two different aspects of the representative method: The method of stratified sampling and the method of purposive selection with discussion. Journal of the Royal Statistical Society, 97, 558–625.
• Robins, J. M., Vaart, A., and Ventura, V. (2000). Asymptotic distribution of p values in composite null models. Journal of the American Statistical Association, 95, 1143–1156.
• Rubin, B. D. (1974). Estimating causal effects of treatments in randomized and non randomized studies. Journal of Educational Psychology 66, 688–701.
• Rubin, B. D. (1978). Bayesian inference for causal effects. Annals of Statistics, 6, 34–58.
• Rubin, D. B. (1980). Comment on "Randomization Analysis of Experimental Data in the Fisher Randomization Tes" by D. Basu. Journal of the American Statistical Association, 75, 591–593.
• Rubin, D. B. (1981). Estimation in parallel randomized experiments. Journal of Educational Statistics, 6(4), 377–401.
• Rubin, D. B. (1984). Bayesianly justifiable and relevant frequency calculations for the applied statistician. Annals of Statistics, 12(4), 1151–1172.
• Rubin, D. B. (1996a). Discussion of “Posterior predictive $p$-values?” by Gelman, A., Meng, X. L. and Stern, H.. Statistica Sinica, 6, 787–792.
• Rubin, D. B. (1996b). Multiple imputation after 18+ years (with discussion). Journal of the American Statistical Association, 91, 473–520.
• Rubin, D. B. (1998). More powerful randomization-based $p$-values in double-blind trials with non-compliance. Statistics in Medicine, 17(3), 371–85.
• Tanner, M. A. & Wong, W. H. (1987). The calculation of posterior distributions by data augmentation (with discussions). Journal of the American Statistical Association, 82, 528–550.