Detecting column dependence when rows are correlated and estimating the strength of the row correlation

Omkar Muralidharan

doi:10.1214/10-EJS592

2010 Detecting column dependence when rows are correlated and estimating the strength of the row correlation

Omkar Muralidharan

Electron. J. Statist. 4: 1527-1546 (2010). DOI: 10.1214/10-EJS592

Abstract

Microarray experiments often yield a normal data matrix X whose rows correspond to genes and columns to samples. We commonly calculate test statistics Z=Xw, where Z_i is a test statistic for the ith gene, and apply false discovery rate (FDR) controlling methods to find interesting genes. For example, Z could measure the difference in expression levels between treatment and control groups and we could seek differentially expressed genes. The empirical cdf of Z is important for FDR methods, since its mean and variance determine the bias and variance of FDR estimates. Efron (2009b) has shown that if the columns of X are independent, the variance of the empirical cdf of Z only depends on the mean-squared row correlation.

Microarray data, however, frequently shows signs of column dependence. In this paper, we show that Efron’s result still holds under column dependence, and give a conservative (upwardly biased) estimator for the mean-squared row correlation. We show Fisher’s transformation for sample correlations is still normalizing and variance stabilizing under column dependence, and use it to construct a permutation-invariant test of column independence. Finally, we argue that estimating the mean-squared row correlation under column dependence is impossible in general. Code to perform our test is available in the R package “colcor,” available on CRAN.

Citation

Download Citation

Omkar Muralidharan. "Detecting column dependence when rows are correlated and estimating the strength of the row correlation." Electron. J. Statist. 4 1527 - 1546, 2010. https://doi.org/10.1214/10-EJS592