The Annals of Applied Statistics

Are a set of microarrays independent of each other?

Bradley Efron

Full-text: Open access


Having observed an m×n matrix X whose rows are possibly correlated, we wish to test the hypothesis that the columns are independent of each other. Our motivation comes from microarray studies, where the rows of X record expression levels for m different genes, often highly correlated, while the columns represent n individual microarrays, presumably obtained independently. The presumption of independence underlies all the familiar permutation, cross-validation and bootstrap methods for microarray analysis, so it is important to know when independence fails. We develop nonparametric and normal-theory testing methods. The row and column correlations of X interact with each other in a way that complicates test procedures, essentially by reducing the accuracy of the relevant estimators.

Article information

Ann. Appl. Stat., Volume 3, Number 3 (2009), 922-942.

First available in Project Euclid: 5 October 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Total correlation effective sample size permutation tests matrix normal distribution row and column correlations


Efron, Bradley. Are a set of microarrays independent of each other?. Ann. Appl. Stat. 3 (2009), no. 3, 922--942. doi:10.1214/09-AOAS236.

Export citation


  • Anderson, T. W. (2003). An Introduction to Multivariate Statistical Analysis, 3rd ed. Wiley, New York.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
  • Bolstad, B. M., Irizarry, R. A., Åstrand, M. and Speed, T. P. (2003). Comparison of normalization methods for high density oligonucleotide array data based on variance and bias. Bioinformatics 19 185–193. Available at
  • Callow, M., Dudoit, S., Gong, E., Speed, T. and Rubin, E. (2000). Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Research 10 2022–2029.
  • Efron, B. (2004). Large-scale simultaneous hypothesis testing: The choice of a null hypothesis. J. Amer. Statist. Assoc. 99 96–104.
  • Efron, B. (2007a). Correlation and large-scale simultaneous significance testing. J. Amer. Statist. Assoc. 102 93–103.
  • Efron, B. (2007b). Size, power, and false discovery rates. Ann. Statist. 35 1351–1377.
  • Efron, B. (2008). Microarrays, empirical Bayes, and the two-groups model (with discussion and rejoinder). Statist. Sci. 23 1–47.
  • Johnson, D. E. and Graybill, F. A. (1972). An analysis of a two-way model with interaction and no replication. J. Amer. Statist. Assoc. 67 862–868.
  • Johnson, N. L. and Kotz, S. (1970). Continuous Univariate Distributions 1. Houghton Mifflin Company, Boston.
  • Mardia, K., Kent, J. and Bibby, J. (1979). Multivariate Analysis. Academic Press, London/San Diego.
  • Owen, A. B. (2005). Variance of the number of false discoveries. J. Roy. Statist. Soc. Ser. B 67 411–426.
  • Qiu, X., Brooks, A. I., Klebanov, L. and Yakovlev, A. (2005). The effects of normalization on the correlation structure of microarray data. BMC Bioinformatics 6 120. Available at
  • Qiu, X., Klebanov, L. and Yakovlev, A. (2005). Correlation between gene expression levels and limitations of the empirical Bayes methodology for finding differentially expressed genes. Statist. Appl. Genet. Mol. Bio. 4, article 34. Available at
  • Singh, D., Febbo, P. G., Ross, K., Jackson, D. G., Manola, J., Ladd, C., Tamayo, P., Renshaw, A. A., D’Amico, A. V., Richie, J. P., Lander, E. S., Loda, M., Kantoff, P. W., Golub, T. R. and Sellers, W. R. (2002). Gene expression correlates of clinical prostate cancer behavior. Cancer Cell 1 203–209.
  • Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Nat. Acad. Sci. USA 98 5116–5121. Available at