The Annals of Statistics

A multivariate empirical Bayes statistic for replicated microarray time course data

Yu Chuan Tai and Terence P. Speed

Full-text: Open access


In this paper we derive one- and two-sample multivariate empirical Bayes statistics (the MB-statistics) to rank genes in order of interest from longitudinal replicated developmental microarray time course experiments. We first use conjugate priors to develop our one-sample multivariate empirical Bayes framework for the null hypothesis that the expected temporal profile stays at 0. This leads to our one-sample MB-statistic and a one-sample 2-statistic, a variant of the one-sample Hotelling T2-statistic. Both the MB-statistic and 2-statistic can be used to rank genes in the order of evidence of nonzero mean, incorporating the correlation structure across time points, moderation and replication. We also derive the corresponding MB-statistics and 2-statistics for the one-sample problem where the null hypothesis states that the expected temporal profile is constant, and for the two-sample problem where the null hypothesis is that two expected temporal profiles are the same.

Article information

Ann. Statist., Volume 34, Number 5 (2006), 2387-2412.

First available in Project Euclid: 23 January 2007

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Primary: 62M10: Time series, auto-correlation, regression, etc. [See also 91B84]
Secondary: 62C12: Empirical decision procedures; empirical Bayes procedures 92D10: Genetics {For genetic algebras, see 17D92}

Microarray time course longitudinal multivariate empirical Bayes moderation gene ranking replication


Tai, Yu Chuan; Speed, Terence P. A multivariate empirical Bayes statistic for replicated microarray time course data. Ann. Statist. 34 (2006), no. 5, 2387--2412. doi:10.1214/009053606000000759.

Export citation


  • Aitchison, J. and Dunsmore, I. R. (1975). Statistical Prediction Analysis. Cambridge Univ. Press.
  • Baldi, P. and Long, A. D. (2001). A Bayesian framework for the analysis of microarray expression data: Regularized $t$-test and statistical inferences of gene changes. Bioinformatics 17 509--519.
  • Bar-Joseph, Z., Gerber, G., Simon, I., Gifford, D. K. and Jaakkola, T. S. (2003). Comparing the continuous representation of time-series expression profiles to identify differentially expressed genes. Proc. Natl. Acad. Sci. USA 100 10,146--10,151.
  • Bickel, P. J. and Doksum, K. A. (2001). Mathematical Statistics: Basic Ideas and Selected Topics, 2nd ed. 1. Prentice Hall, Upper Saddle River, NJ.
  • Bolstad, B., Irizarry, R., \^Astrand, M. and Speed, T. (2003). A comparison of normalization methods for high density oligonucleotide array data based on bias and variance. Bioinformatics 19 185--193.
  • Broberg, P. (2003). Statistical methods for ranking differentially expressed genes. Genome Biology 4 R41.
  • Cho, R., Campbell, M., Winzeler, E., Steinmetz, L., Conway, A., Wodicka, L., Wolfsberg, T., Gabrielian, A., Landsman, D., Lockhart, D. and Davis, R. (1998). A genome-wide transcriptional analysis of the mitotic cell cycle. Molecular Cell 2 65--73.
  • Cho, R., Huang, M., Campbell, M., Dong, H., Steinmetz, L., Sapinoso, L., Hampton, G., Elledge, S., Davis, R. and Lockhart, D. (2001). Transcriptional regulation and function during the human cell cycle. Nature Genetics 27 48--54.
  • Chu, S., DeRisi, J., Eisen, M., Mulholland, J., Botstein, D., Brown, P. O. and Herskowitz, I. (1998). The transcriptional program of sporulation in budding yeast. Science 282 699--705.
  • Diggle, P. J. (1990). Time Series: A Biostatistical Introduction. Oxford Univ. Press, New York.
  • Diggle, P. J., Heagerty, P., Liang, K.-Y. and Zeger, S. L. (2002). Analysis of Longitudinal Data, 2nd ed. Oxford Univ. Press, New York.
  • Dudoit, S., Fridlyand, J. and Speed, T. (2002). Comparison of discrimination methods for the classification of tumors using gene expression data. J. Amer. Statist. Assoc. 97 77--87.
  • Dudoit, S., Yang, Y. H., Callow, M. and Speed, T. (2002). Statistical methods for identifying differentially expressed genes in replicated cDNA microarray experiments. Statist. Sinica 12 111--139.
  • Efron, B., Tibshirani, R., Storey, J. D. and Tusher, V. (2001). Empirical Bayes analysis of a microarray experiment. J. Amer. Statist. Assoc. 96 1151--1160.
  • Guo, X., Qi, H., Verfaillie, C. M. and Pan, W. (2003). Statistical significance analysis of longitudinal gene expression data. Bioinformatics 19 1628--1635.
  • Gupta, A. and Nagar, D. (2000). Matrix Variate Distributions. Chapman and Hall/CRC, Boca Raton, FL.
  • Hong, F. and Li, H. (2006). Functional hierarchical models for identifying genes with different time-course expression profiles. Biometrics 62 534--544.
  • Irizarry, R. A., Bolstad, B. M., Collin, F., Cope, L. M., Hobbs, B. and Speed, T. P. (2003). Summaries of Affymetrix GeneChip probe level data. Nucleic Acids Res. 31 e15.
  • Kendziorski, C., Newton, M., Lan, H. and Gould, M. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression profiles. Statistics in Medicine 22 3899--3914.
  • Lönnstedt, I. and Speed, T. P. (2002). Replicated microarray data. Statist. Sinica 12 31--46.
  • Mardia, K., Kent, J. and Bibby, J. (1979). Multivariate Analysis. Academic Press, New York.
  • Park, T., Yi, S.-G., Lee, S., Lee, S. Y., Yoo, D.-H., Ahn, J.-I. and Lee, Y.-S. (2003). Statistical tests for identifying differentially expressed genes in time-course microarray experiments. Bioinformatics 19 694--703.
  • Reiner, A., Yekutieli, D. and Benjamini, Y. (2003). Identifying differentially expressed genes using false discovery rate controlling procedures. Bioinformatics 19 368--375.
  • Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3 article 3.
  • Spellman, P. T., Sherlock, G., Zhang, M. Q., Iyer, V. R., Anders, K., Eisen, M. B., Brown, P. O., Botstein, D. and Futcher, B. (1998). Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. Molecular Biology of the Cell 9 3273--3297.
  • Storch, K.-F., Lipan, O., Leykin, I., Viswanathan, N., Davis, F. C., Wong, W. H. and Weitz, C. J. (2002). Extensive and divergent circadian gene expression in liver and heart. Nature 417 78--83.
  • Storey, J., Xiao, W., Leek, J. T., Tompkins, R. G. and Davis, R. W. (2005). Significance analysis of time course microarray experiments. Proc. Natl. Acad. Sci. USA 102 12,837--12,842.
  • Tai, Y. C. (2005). Multivariate empirical Bayes models for replicated microarray time course data. Ph.D. dissertation, Div. Biostatistics, Univ. California, Berkeley.
  • Tai, Y. C. and Speed, T. P. (2005). Statistical analysis of microarray time course data. In DNA Microarrays (U. Nuber, ed.) Chapter 20. Chapman and Hall/CRC, New York.
  • Tai, Y. C. and Speed, T. P. (2005). Longitudinal microarray time course $\mathitMB$-statistic for multiple biological conditions. Dept. Statistics, Univ. California, Berkeley. In preparation.
  • Tai, Y. C. and Speed, T. P. (2005). Cross-sectional microarray time course $\mathitMB$-statistic. Dept. Statistics, Univ. California, Berkeley. In preparation.
  • Tamayo, P., Slonim, D., Mesirov, J., Zhu, Q., Kitareewan, S., Dmitrovsky, E., Lander, E. S. and Golub, T. R. (1999). Interpreting patterns of gene expression with self-organizing maps: Methods and application to hematopoietic differentiation. Proc. Natl. Acad. Sci. USA 96 2907--2912.
  • Tusher, V. G., Tibshirani, R. and Chu, G. (2001). Significance analysis of microarrays applied to the ionizing radiation response. Proc. Natl. Acad. Sci. USA 98 5116--5121.
  • Wen, X., Fuhrman, S., Michaels, G. S., Carr, D. B., Smith, S., Barker, J. L. and Somogyi, R. (1998). Large-scale temporal gene expression mapping of central nervous system development. Proc. Natl. Acad. Sci. USA 95 334--339.
  • Wildermuth, M. C., Tai, Y. C., Dewdney, J., Denoux, C., Hather, G., Speed, T. P. and Ausubel, F. M. (2006). Application of $\widetildeT^2$ statistic to temporal global Arabidopsis expression data reveals known and novel salicylate-impacted processes. To appear.
  • Yuan, M. and Kendziorski, C. (2006). Hidden Markov models for microarray time course data in multiple biological conditions (with discussion). J. Amer. Statist. Assoc. 101 1323--1340.