## Statistical Science

- Statist. Sci.
- Volume 23, Number 2 (2008), 219-236.

### Covariate Balance in Simple, Stratified and Clustered Comparative Studies

**Full-text: Open access**

#### Abstract

In randomized experiments, treatment and control groups should be roughly the same—balanced—in their distributions of pretreatment variables. But how nearly so? Can descriptive comparisons meaningfully be paired with significance tests? If so, should there be several such tests, one for each pretreatment variable, or should there be a single, omnibus test? Could such a test be engineered to give easily computed *p*-values that are reliable in samples of moderate size, or would simulation be needed for reliable calibration? What new concerns are introduced by random assignment of clusters? Which tests of balance would be optimal?

To address these questions, Fisher’s randomization inference is applied to the question of balance. Its application suggests the reversal of published conclusions about two studies, one clinical and the other a field experiment in political participation.

#### Article information

**Source**

Statist. Sci. Volume 23, Number 2 (2008), 219-236.

**Dates**

First available in Project Euclid: 21 August 2008

**Permanent link to this document**

http://projecteuclid.org/euclid.ss/1219339114

**Digital Object Identifier**

doi:10.1214/08-STS254

**Mathematical Reviews number (MathSciNet)**

MR2516821

**Keywords**

Cluster contiguity community intervention group randomization randomization inference subclassification

#### Citation

Hansen, Ben B.; Bowers, Jake. Covariate Balance in Simple, Stratified and Clustered Comparative Studies. Statist. Sci. 23 (2008), no. 2, 219--236. doi:10.1214/08-STS254. http://projecteuclid.org/euclid.ss/1219339114.

#### References

- Agresti, A. (2002).
*Categorical Data Analysis*. Wiley, New York.Mathematical Reviews (MathSciNet): MR1914507 - Agresti, A. and Gottard, A. (2005). Comment: Randomized confidence intervals and the mid-
*p*approach.*Statist. Sci.***20**367–371. - Altman, D. G. (1985). Comparability of randomised groups.
*The Statistician***34**125–136. - Arceneaux, K. T., Gerber, A. S. and Green, D. P. (2004). Monte Carlo simulation of the biases in misspecified randomization checks. Technical report, Yale Univ., Institution for Social and Policy Studies.
- Barndorff-Nielsen, O. E. and Cox, D. R. (1994).
*Inference and Asymptotics*. Chapman and Hall, London. - Begg, C. (1990). Significance tests of covariate imbalance in clinical trials.
*Controlled Clinical Trials***11**223–225. - Begg, C., Cho, M., Eastwood, S., Horton, R., Moher, D., Olkin, I., Pitkin, R., Rennie, D., Schulz, K., Simel, D. et al. (1996). Improving the quality of reporting of randomized controlled trials. The CONSORT statement.
*J. Amer. Medical Assoc.***276**637–639. - Berger, V. W. (2005). Quantifying the magnitude of baseline covariate imbalances resulting from selection bias in randomized clinical trials.
*Biometrical J*.**47**119–139. - Berger, V. W. and Exner, D. V. (1999). Detecting selection bias in randomized clinical trials.
*Controlled Clinical Trials***20**319–327. - Blyth, C. R. (1972). On Simpson’s paradox and the sure-thing principle.
*J. Amer. Statist. Assoc.***67**364–366. - Bowers, J. and Hansen, B. B. (2006). RItools, an add-on package for R.
- Brazzale, A. R., Davison, A. C. and Reid, N. (2006).
*Applied Asymptotics*. Cambridge Univ. Press.Zentralblatt MATH: 1152.62077 - Breslow, N. E. and Day, N. E. (1987).
*Statistical Methods in Cancer Research*.**II**.*The Design and Analysis of Cohort Studies*. International Agency for Research on Cancer Lyern, France. - Campbell, M. K., Elbourne, D. R. and Altman, D. G. (2004). consort statement: extension to cluster randomised trials.
*British Medical J.***328**702–708. - Cochran, W. G. and Rubin, D. B. (1973). Controlling bias in observational studies: A review.
*Sankhyā Ser. A*:*Indian J. Statist.***35**417–446. - Davison, A. (2003).
*Statistical Models*. Cambridge Univ. Press.Mathematical Reviews (MathSciNet): MR1998913 - Divine, G., Brown, J. and Frazier, L. (1992). The unit of analysis error in studies about physicians’ patient care behaviour.
*J. General Internal Medicine***71**623–629. - Donner, A. and Klar, N. (1994). Methods for comparing event rates in intervention studies when the unit of allocation is a cluster.
*Amer. J. Epidemiology***140**279–289. - Donner, A. and Klar, N. (2000).
*Design and Analysis of Cluster Randomization Trials in Health Research*. Edward Arnold Publishers, London. - Erdős, P. and Rényi, A. (1959). On the central limit theorem for samples from a finite population.
*Magyar Tud. Akad. Mat. Kutató Int. Közl.***4**49–61. - Feller, W. (1971).
*An Introduction to Probability Theory and Its Applications*.**II**, 2nd ed. Wiley, New York. - Fleiss, J. (1973).
*Statistical Methods for Rates and Proportions Rates and Proportions*. Wiley, New York.Mathematical Reviews (MathSciNet): MR622544 - Gail, M. H., Mark, S. D., Carroll, R. J., Green, S. B. and Pee, D. (1996). On design considerations and randomization-based inference for community intervention trials.
*Statistics in Medicine***15**1069–1092. - Gerber, A. S. and Green, D. P. (2000). The effects of canvassing, telephone calls, and direct mail on voter turnout: A field experiment.
*American Political Science Review***94**653–663. - Gerber, A. S. and Green, D. P. (2005). Correction to Gerber and Green (2000), replication of disputed findings, and reply to Imai (2005).
*American Political Science Review***99**301–313. - Hájek, J. (1960). Limiting distributions in simple random sampling from a finite population.
*Magyar Tud. Akad. Mat. Kutató Int. Közl.***5**361–374. - Hájek, J. and Šidák, Z. (1967).
*Theory of Rank Tests*. Academic Press, New York. - Hansen, B. B. (2008). The essential role of balance tests in propensity-matched observational studies: Comments on “A critical appraisal of propensity-score matching in the medical literature between 1996 and 2003” by P. Austin.
*Statistics in Medicine***27**2050–2054. - Harrell, F. E. (2001).
*Regression Modeling Strategies*:*With Applications to Linear Models*,*Logistic Regression*,*and Survival Analysis*. Springer, New York.Zentralblatt MATH: 0982.62063 - Highton, B. and Wolfinger, R. (2001). The first seven years of the political life cycle.
*American Journal of Political Science***45**202–209. - Hill, J. L., Thomas, N. and Rubin, D. B. (2000). The design of the New York schools choice scholarship program evaluation. In
*Research Design*:*Donald Campbell’s Legacy*(L. Bickman, ed) 155–180. Sage Publications, Thousand Oaks, CA. - Hoeffding, W. (1963). Probability inequalities for sums of bounded random variables.
*J. Amer. Statist. Assoc.***58**13–30.Mathematical Reviews (MathSciNet): MR144363

Zentralblatt MATH: 0127.10602

Digital Object Identifier: doi: 10.2307/2282952

JSTOR: links.jstor.org - Höglund, T. (1978). Sampling from a finite population. A remainder term estimate.
*Scand. J. Statist.***5**69–71.Mathematical Reviews (MathSciNet): MR471130 - Hotelling, H. (1931). The generalization of Student’s ratio.
*Ann. Math. Statist.***2**360–378. - Hothorn, T., Hornik, K., van de Wiel, M. A. and Zeileis, A. (2006). A Lego system for conditional inference.
*American Statist.***60**257–263. - Imai, K. (2005). Do get-out-the-vote calls reduce turnout? The importance of statistical methods for field experiments.
*American Political Science Review***99**283–300. - Imai, K., King, G. and Stuart, E. (2008). Misunderstandings among experimentalists and observationalists: Balance test fallacies in causal inference.
*J. Roy. Statist. Soc. Ser A***171**481–502.Mathematical Reviews (MathSciNet): MR2427345

Digital Object Identifier: doi: 10.1111/j.1467-985X.2007.00527.x - Isaakidis, P. and Ioannidis, J. P. A. (2003). Evaluation of cluster randomized controlled trials in sub-Saharan Africa.
*Amer. J. Epidemiology***158**921–926. - Kalton, G. (1968). Standardization: A technique to control for extraneous variables.
*Appl. Statist.***17**118–136. - Kerry, S. M. and Bland, J. M. (1998). Analysis of a trial randomised in clusters.
*British Medical J.***316**54. - Le Cam, L. (1960). Locally asymptotically normal families of distributions. Certain approximations to families of distributions and their use in the theory of estimation and testing hypotheses.
*Univ. California Publ. Statist.***3**37–98.Mathematical Reviews (MathSciNet): MR126903 - Lee, W.-S. (2006). Propensity score matching and variations on the balancing test. Technical report, Melbourne Institute of Applied Economic and Social Research.
- Lewsey, J. (2004). Comparing completely and stratified randomized designs in cluster randomized trials when the stratifying factor is cluster size: A simulation study.
*Statistics in Medicine***23**897–905. - MacLennan, G., Ramsay, C., Mollison, J., Campbell, M., Grimshaw, J. and Thomas, R. (2003). Room for improvement in the reporting of cluster randomised trials in behaviour change research.
*Controlled Clinical Trials***24**69S–70S. - Murray, D. M. (1998).
*Design and Analysis of Group-randomized Trials*. Oxford Univ. Press. - Neyman, J. (1923). On the application of probability theory to agricultural experiments. Essay on principles. Section 9.
*Statist. Sci.***5**463–480. - Peduzzi, P., Concato, J., Kemper, E., Holford, T. and Feinstein, A. (1996). A simulation study of the number of events per variable in logistic regression analysis.
*J. Clinical Epidemiology***49**1373–1379. - Raab, G. M. and Butcher, I. (2001). Balance in cluster randomized trials.
*Statistics in Medicine***20**351–365. - Rosenbaum, P. R. and Rubin, D. B. (1984). Reducing bias in observational studies using subclassification on the propensity score.
*J. Amer. Statist. Assoc.***79**516–524. - Senn, S. J. (1994). Testing for baseline balance in clinical trials.
*Statistics in Medicine***13**1715–1726. - Whitehead, J. (1993). Sample size calculations for ordered categorical data.
*Statistics in Medicine***12**2257–2271. - Yudkin, P. L. and Moher, M. (2001). Putting theory into practice: A cluster randomized trial with a small number of clusters.
*Statistics in Medicine***20**341–349.

### More like this

- A Note on Incomplete Block Designs with Row Balance

Hartley, H. O., Shrikhande, S. S., and Taylor, W. B., The Annals of Mathematical Statistics, 1953 - Ethics and Statistics in Randomized Clinical Trials

Royall, Richard M., Statistical Science, 1991 - On the Theory of Systematic Sampling, II

Madow, William G., The Annals of Mathematical Statistics, 1949

- A Note on Incomplete Block Designs with Row Balance

Hartley, H. O., Shrikhande, S. S., and Taylor, W. B., The Annals of Mathematical Statistics, 1953 - Ethics and Statistics in Randomized Clinical Trials

Royall, Richard M., Statistical Science, 1991 - On the Theory of Systematic Sampling, II

Madow, William G., The Annals of Mathematical Statistics, 1949 - Which design is better? Ehrenfest urn versus biased coin

Chen, Yung-Pin, Advances in Applied Probability, 2000 - Introduction: How to Deal with Uncertainty in Population Forecasting?

Lutz1, Wolfgang and Goldstein, Joshua R., International Statistical Review, 2004 - The Essential Role of Pair Matching in Cluster-Randomized Experiments, with Application to the Mexican Universal Health Insurance Evaluation

Imai, Kosuke, King, Gary, and Nall, Clayton, Statistical Science, 2009 - A Review of Accelerated Test Models

Escobar, Luis A. and Meeker, William Q., Statistical Science, 2006 - Near-ideal model selection by ℓ1 minimization

Candès, Emmanuel J. and Plan, Yaniv, The Annals of Statistics, 2009 - An Omnibus Test for Independence of a Survival Time from a Covariate

McKeague, Ian W., Nikabadze, A. M., and Sun, Yanqing, The Annals of Statistics, 1995 - Detecting a target in very noisy data from multiple looks

Jin, Jiashun, A Festschrift for Herman Rubin, 2004