Statistical Science

Laplace Approximated EM Microarray Analysis: An Empirical Bayes Approach for Comparative Microarray Experiments

Haim Bar, James Booth, Elizabeth Schifano, and Martin T. Wells

Full-text: Open access

Abstract

A two-groups mixed-effects model for the comparison of (normalized) microarray data from two treatment groups is considered. Most competing parametric methods that have appeared in the literature are obtained as special cases or by minor modification of the proposed model. Approximate maximum likelihood fitting is accomplished via a fast and scalable algorithm, which we call LEMMA (Laplace approximated EM Microarray Analysis). The posterior odds of treatment × gene interactions, derived from the model, involve shrinkage estimates of both the interactions and of the gene specific error variances. Genes are classified as being associated with treatment based on the posterior odds and the local false discovery rate (f.d.r.) with a fixed cutoff. Our model-based approach also allows one to declare the non-null status of a gene by controlling the false discovery rate (FDR). It is shown in a detailed simulation study that the approach outperforms well-known competitors. We also apply the proposed methodology to two previously analyzed microarray examples. Extensions of the proposed method to paired treatments and multiple treatments are also discussed.

Article information

Source
Statist. Sci., Volume 25, Number 3 (2010), 388-407.

Dates
First available in Project Euclid: 4 January 2011

Permanent link to this document
https://projecteuclid.org/euclid.ss/1294167966

Digital Object Identifier
doi:10.1214/10-STS339

Mathematical Reviews number (MathSciNet)
MR2791674

Zentralblatt MATH identifier
1329.62114

Keywords
EM algorithm empirical Bayes Laplace approximation LEMMA LIMMA linear mixed models local false discovery rate microarray analysis mixture model two-groups model

Citation

Bar, Haim; Booth, James; Schifano, Elizabeth; Wells, Martin T. Laplace Approximated EM Microarray Analysis: An Empirical Bayes Approach for Comparative Microarray Experiments. Statist. Sci. 25 (2010), no. 3, 388--407. doi:10.1214/10-STS339. https://projecteuclid.org/euclid.ss/1294167966


Export citation

References

  • Allison, D. B., Cui, X., Page, G. P. and Sabripour, M. (2006). Microarray data analysis: From disarray to consolidation and consensus. Nat. Genet. 7 55–65.
  • Alon, U., Barkai, N., Notterman, D. A., Gish, K., Ybarra, S., Mack, D. and Levine, A. J. (1999). Broad patterns of gene expression revealed by clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays. Proc. Natl. Acad. Sci. USA 96 6745–6750.
  • Baldi, P. and Long, A. D. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized t-test and statistical inferences of gene changes. Bioinformatics 17 509–519.
  • Bar, H. and Schifano, E. (2009). lemma: Laplace approximated EM Microarray Analysis R package, Version 1.2-1.
  • Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate—a practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 499–517.
  • Butler, R. W. (2007). Saddlepoint Approximations with Applications. Cambridge Univ. Press, Cambridge.
  • Callow, M. J., Dudoit, S., Gong, E. L., Speed, T. P. and Rubin, E. M. (2000). Microarray expression profiling identifies genes with altered expression in HDL-deficient mice. Genome Res. 10 2022–2059.
  • Cui, X. and Churchill, G. A. (2003). Statistical tests for differential expression in cDNA microarray experiments. Genome Biol. 4 210.
  • Cui, X., Hwang, J. T. G., Qui, J., Blades, N. J. and Churchill, G. A. (2005). Improved statistical tests for differential gene expression by shrinking variance components. Biostatistics 6 59–75.
  • de Bruijn, N. G. (1981). Asymptotic Methods in Analysis. Dover, New York.
  • Do, K.-A., Müller, P. and Tang, F. (2005). A Bayesian mixture model for differential gene expression. J. Roy. Statist. Soc. Ser. C 54 627–644.
  • Efron, B. (2005). Local false discovery rates. Available at http://www-stat.stanford.edu/~ckirby/brad/papers/ 2005LocalFDR.pdf.
  • Efron, B. (2008). Microarrays, empirical Bayes and the two groups model. Statist. Sci. 23 1–22.
  • Efron, B., Tibshirani, R., Storey, J. and Tusher, V. (2001). Empirical Bayes analysis of microarray experiment. J. Amer. Statist. Assoc. 96 1151–1160.
  • Efron, B., Turnbull, B. B. and Narasimhan, B. (2008). locfdr: Computes local false discovery rates R package, Version 1.1-6.
  • Figueroa, M. E., Reimers, M., Thompson, R. F., Ye, K., Li, Y., Selzer, R. R., Fridriksson, J., Paietta, E., Wiernik, P., Green, R. D., Greally, J. M. and Melnick, A. (2008). An integrative genomic and epigenomic approach for the study of transcriptional regulation. PLoS ONE 3 e1882.
  • Genovese, C. and Wasserman, L. (2002). Operating characteristics and extensions of the false discovery rate procedure. J. Roy. Statist. Soc. Ser. B 64 499–517.
  • Hong, C. S. (2009). Optimal threshold from ROC and CAP curves. Comm. Statist. Simulation Comput. 38 2060–2072.
  • Hwang, J. T. G. and Liu, P. (2010). Optimal tests shrinking both means and variances applicable to microarray data. Stat. Appl. Genet. Mol. Biol. 9 article 36.
  • Kendziorski, C. M., Newton, M. A., Lan, H. and Gould, M. N. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Stat. Med. 22 3899–3914.
  • Kerr, M., Martin, M. and Churchill, G. (2000). Analysis of variance in microarray data. J. Comput. Biol. 7 819–837.
  • Liu, P. (2006). Sample size calculation and empirical Bayes tests for microarray data, Ph.D. thesis, Cornell Univ.
  • Lonnstedt, I., Rimini, R. and Nilsson, P. (2005). Empirical Bayes microarray ANOVA and grouping cell lines by equal expression levels. Statist. Appl. Genet. Mol. Biol. 4 Article 7.
  • Lonnstedt, I. and Speed, T. (2002). Replicated microarray data. Statist. Sinica 12 31–46.
  • Newton, M. A., Kendziorski, C. M., Richmond, C. S., Blattner, F. R. and Tsui, K. W. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. Comput. Biol. 8 37–52.
  • R Development Core Team (2007). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria, ISBN 3-900051-07-0.
  • Smyth, G. K. (2004). Linear models for empirical Bayes methods for assessing differential expression in microarray experiments. Statist. Appl. Genet. Mol. Biol. 3 Article 3.
  • Smyth, G. K. (2005). Limma: Linear models for microarray data. In Bioinformatics and Computational Biology Solutions using R and Bioconductor (R. Gentleman, V. Carey, S. Dudoit, R. Irizarry and W. Huber, eds.) 397–420. Springer, New York.
  • Su, Y., Murali, T. M., Pavlovic, V., Schaffer, M. and Kasif, S. (2003). RankGene: Identification of diagnostic genes based on expression data. Bioinformatics 19 1578–1579.
  • Tai, Y. C. and Speed, T. P. (2006). A multivariate empirical Bayes statistic for replicated microarray time course data. Ann. Statist. 34 2387–2412.
  • Tai, Y. C. and Speed, T. P. (2009). On gene ranking using replicated microarray time course data. Biometrics 65 40–51.
  • Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98 5116–5121.
  • Wright, G. W. and Simon, R. M. (2003). A random variance model for detection of differential gene expression in small microarray experiments. Bioinformatics 19 2448–2455.
  • Zhang, M., Zhang, D. and Wells, M. T. (2010). Generalized thresholding estimators for high-dimensional location parameters. Statist. Sinica 20 911–926.