## The Annals of Applied Statistics

### Rank tests in unmatched clustered randomized trials applied to a study of teacher training

#### Abstract

In the Teacher and Leader Performance Evaluation Systems study, schools were randomly assigned to receive new measures of teacher and principal performance. One outcome in the study, measured at the teacher level, was truncated at zero, and displayed a long tail. Rank-based statistics are one natural method to apply to such outcomes, since inferences will be robust and exact, and we can avoid assumptions about the model that generated the data. We investigate four different possible rank statistics that vary in the form of weighting applied to clusters. Each test statistic has the correct level but may vary in terms of the power to detect departures from the null. We conduct simulations for power comparing to linear mixed models with Normal, $t$, and Cauchy errors. We obtain a point estimate and construct confidence intervals by applying the Tobit model of effects, which assumes that treatment increases the outcome by a constant amount but only if the response under control would be positive. We also develop a formal randomization-based method for testing the appropriateness of the Tobit model of effects. In the data from the study, we find no evidence against the Tobit model of effects.

#### Article information

Source
Ann. Appl. Stat., Volume 12, Number 4 (2018), 2151-2174.

Dates
Revised: October 2017
First available in Project Euclid: 13 November 2018

https://projecteuclid.org/euclid.aoas/1542078040

Digital Object Identifier
doi:10.1214/18-AOAS1147

Mathematical Reviews number (MathSciNet)
MR3875696

#### Citation

Ding, Peng; Keele, Luke. Rank tests in unmatched clustered randomized trials applied to a study of teacher training. Ann. Appl. Stat. 12 (2018), no. 4, 2151--2174. doi:10.1214/18-AOAS1147. https://projecteuclid.org/euclid.aoas/1542078040

#### References

• Aronow, P. M., Middleton, J. A. et al. (2013). A class of unbiased estimators of the average treatment effect in randomized experiments. Journal of Causal Inference 1 135–154.
• Berger, R. L. and Boos, D. D. (1994). $P$ values maximized over a confidence set for the nuisance parameter. J. Amer. Statist. Assoc. 89 1012–1016.
• Borman, G. D., Slavin, R. E., Cheung, A., Chamberlain, A. M., Madden, N. A. and Chambers, B. (2005). Success for all: First-year results from the national randomized field trial. Educ. Eval. Policy Anal. 27 1–22.
• Braun, T. M. and Feng, Z. (2001). Optimal permutation tests for the analysis of group randomized trials. J. Amer. Statist. Assoc. 96 1424–1432.
• Chetty, R., Friedman, J. N. and Rockoff, J. E. (2014). Measuring the impacts of teachers I: Evaluating bias in teacher value-added estimates. Am. Econ. Rev. 104 2593–2632.
• Cochran, W. G. (1977). Sampling Techniques, 3rd ed. Wiley, New York.
• Cornfield, J. (1978). Randomization by group. Am. J. Epidemiol. 108 100–102.
• Datta, S. and Satten, G. A. (2005). Rank-sum tests for clustered data. J. Amer. Statist. Assoc. 100 908–915.
• Ding, P. (2017). A paradox from randomization-based causal inference. Statist. Sci. 32 331–345.
• Ding, P., Feller, A. and Miratrix, L. (2016). Randomization inference for treatment effect variation. J. R. Stat. Soc. Ser. B. Stat. Methodol. 78 655–671.
• Donner, A. and Klar, N. (2000). Design and Analysis of Cluster Randomization Trials in Health Research. Wiley, New York.
• Dutta, S. and Datta, S. (2016). A rank-sum test for clustered data when the number of subjects in a group within a cluster is informative. Biometrics 72 432–440.
• Feder, G., Griffiths, C., Eldridge, S. and Spence, M. (1999). Effect of postal prompts to patients and general practitioners on the quality of primary care after a coronary event (POST): Randomised controlled trial. BMJ 318 1522–1526.
• Fisher, R. A. (1935). The Design of Experiments. Oliver and Boyd, London.
• Gail, M. H., Byar, D. P., Pechacek, T. F., Corle, D. K., Group, C. S. et al. (1992). Aspects of statistical design for the community intervention trial for smoking cessation (COMMIT). Control. Clin. Trials 13 6–21.
• Gail, M. H., Mark, S. D., Carroll, R. J. and Green, S. B. (1996). On design considerations and randomization-based inference for community intervention trials. Stat. Med. 15 1069–1092.
• Hansen, B. B. and Bowers, J. (2009). Attributing effects to a cluster-randomized get-out-the-vote campaign. J. Amer. Statist. Assoc. 104 873–885.
• Hayes, R. and Moulton, L. (2009). Cluster Randomised Trials. Chapman& Hall/CRC, London.
• Hedges, L. V. and Hedberg, E. C. (2007). Intraclass correlation values for planning group-randomized trials in education. Educ. Eval. Policy Anal. 29 60–87.
• Hodges, J. L. Jr. and Lehmann, E. L. (1963). Estimates of location based on rank tests. Ann. Math. Stat. 34 598–611.
• Imbens, G. W. and Rubin, D. B. (2015). Causal Inference—For Statistics, Social, and Biomedical Sciences: An Introduction. Cambridge Univ. Press, New York.
• Imbens, G. M. and Wooldridge, J. M. (2008). Recent developments in the econometrics of program evaluation. Journal of Economic Literature 47 5–86.
• Li, X. and Ding, P. (2017). General forms of finite population central limit theorems with applications to causal inference. J. Amer. Statist. Assoc. 112 1759–1769.
• Middleton, J. A. (2008). Bias of the regression estimator for experiments using clustered random assignment. Statist. Probab. Lett. 78 2654–2659.
• Middleton, J. A. and Aronow, P. M. (2015). Unbiased estimation of the average treatment effect in cluster-randomized experiments. Statistics, Politics and Policy 6 39–75.
• Murnane, R. J. and Willett, J. B. (2010). Methods Matter: Improving Causal Inference in Educational and Social Science Research. Oxford University Press, Oxford.
• Neyman, J. (1990). On the application of probability theory to agricultural experiments. Essay on principles. Section 9. Statist. Sci. 5 465–472. Translated from the Polish and edited by D. M. Dąbrowska and T. P. Speed.
• Neyman, J. (1935). Statistical problems in agricultural experimentation. Suppl. J. R. Stat. Soc. 2 107–180.
• Nolen, T. L. and Hudgens, M. G. (2011). Randomization-based inference within principal strata. J. Amer. Statist. Assoc. 106 581–593.
• Rosenbaum, P. R. (2001). Effects attributable to treatment: Inference in experiments and observational studies with a discrete pivot. Biometrika 88 219–231.
• Rosenbaum, P. R. (2002a). Observational Studies, 2nd ed. Springer, New York.
• Rosenbaum, P. R. (2002b). Covariance adjustment in randomized experiments and observational studies. Statist. Sci. 17 286–327.
• Rosenbaum, P. R. (2007). Confidence intervals for uncommon but dramatic responses to treatment. Biometrics 63 1164–1171, 1313.
• Rosenbaum, P. R. (2010). Design of Observational Studies. Springer, New York.
• Rosner, B., Glynn, R. J. and Lee, M.-L. T. (2003). Incorporation of clustering effects for the Wilcoxon rank sum test: A large-sample approach. Biometrics 59 1089–1098.
• Rosner, B., Glynn, R. J. and Lee, M.-L. T. (2006). The Wilcoxon signed rank test for paired comparisons of clustered data. Biometrics 62 185–192, 318.
• Rubin, D. B. (1974). Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 6 688–701.
• Rubin, D. B. (1986). Which ifs have causal answers. J. Amer. Statist. Assoc. 81 961–962.
• Schochet, P. Z. (2013). Estimators for clustered education RCTs using the Neyman model for causal inference. J. Educ. Behav. Stat. 38 219–238.
• Small, D. S., Have, T. R. T. and Rosenbaum, P. R. (2008). Randomization inference in a group-randomized trial of treatments for depression: Covariate adjustment, noncompliance, and quantile effects. J. Amer. Statist. Assoc. 103 271–279.
• Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica 26 24–36.
• Wayne, J. A., Garet, M. S., Brown, S., Rickles, J., Song, M. and Manzeske, D. (2016). Early Implementation Findings From a Study of Teacher and Principal Performance Measurement and Feedback Year 1 Report. Technical report, American Institutes of Research, Washington, DC.
• Williamson, J. M., Datta, S. and Satten, G. A. (2003). Marginal analyses of clustered data when cluster size is informative. Biometrics 59 36–42.
• Zhang, K., Traskin, M. and Small, D. S. (2012). A powerful and robust test statistic for randomization inference in group-randomized trials with matched pairs of groups. Biometrics 68 75–84.