The Annals of Applied Statistics

Bayesian testing of many hypotheses × many genes: A study of sleep apnea

Shane T. Jensen, Ibrahim Erkan, Erna S. Arnardottir, and Dylan S. Small

Full-text: Open access


Substantial statistical research has recently been devoted to the analysis of large-scale microarray experiments which provide a measure of the simultaneous expression of thousands of genes in a particular condition. A typical goal is the comparison of gene expression between two conditions (e.g., diseased vs. nondiseased) to detect genes which show differential expression. Classical hypothesis testing procedures have been applied to this problem and more recent work has employed sophisticated models that allow for the sharing of information across genes. However, many recent gene expression studies have an experimental design with several conditions that requires an even more involved hypothesis testing approach. In this paper, we use a hierarchical Bayesian model to address the situation where there are many hypotheses that must be simultaneously tested for each gene. In addition to having many hypotheses within each gene, our analysis also addresses the more typical multiple comparison issue of testing many genes simultaneously. We illustrate our approach with an application to a study of genes involved in obstructive sleep apnea in humans.

Article information

Ann. Appl. Stat., Volume 3, Number 3 (2009), 1080-1101.

First available in Project Euclid: 5 October 2009

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Bayesian hypothesis testing FDR control hierarchical models multiple comparisons


Jensen, Shane T.; Erkan, Ibrahim; Arnardottir, Erna S.; Small, Dylan S. Bayesian testing of many hypotheses × many genes: A study of sleep apnea. Ann. Appl. Stat. 3 (2009), no. 3, 1080--1101. doi:10.1214/09-AOAS241.

Export citation


  • Alizadeh, A., Eisen, M., Davis, R., Ma, C., Lossos, I., Rosenwald, A., Boldrick, J., Sabet, H., Tran, T., Yu, X., Powell, J., Yang, L., Marti, G., Moore, T., Hudson, J., Lu, L., Lewis, D., Tibshirani, R., Sherlock, G., Chan, W., Greiner, T., Weisenburger, D., Armitage, J., Warnke, R., Levy, R., Wilson, W., Grever, M., Byrd, J., Botstein, D., Brown, P. and Staudt, L. (2000). Distinct types of diffuse large b-cell lymphoma identified by gene expression profiling. Nature 403 503–511.
  • Bickel, P. and Doksum, K. (2007). Mathematical Statistics: Basic Ideas and Selected Topics 1, 2nd ed. Prentice Hall, Upper Saddle River, NJ.
  • Dempster, A., Laird, N. and Rubin, D. (1977). Maximum likelihood from incomplete data via the em algorithm. J. Roy. Statist. Soc. Ser. B 39 1–38.
  • Dudoit, S., Gentleman, R. C. and Quackenbush, J. (2003). Open source software for the analysis of microarray data. BioTechniques 34 S45–S51.
  • Flury, B. K. and Riedwyl, H. (1986). Standard distance in univariate and multivariate analysis. Amer. Statist. 40 214–215.
  • Geman, S. and Geman, D. (1984). Stochastic relaxation, Gibbs distributions, and the Bayesian restoration of images. IEEE Transaction on Pattern Analysis and Machine Intelligence 6 721–741.
  • Gottardo, R., Raftery, A. E., Yeung, K. Y. and Bumgarner, R. E. (2006). Bayesian robust inference for differential gene expression in microarrays with multiple samples. Biometrics 62 10–18.
  • Irizarry, R. A., Hobbs, B., Collin, F., Beazer-Barclay, Y. D., Antonellis, K. J., Scherf, U. and Speed, T. P. (2006). Exploration, normalization, and summaries of high density oligonucleotide array probe level data. Biostatistics 4 249–264.
  • Jensen, S. T., Erkan, I., Arnadottir, E. S. and Small, D. S. (2009). Supplement to “Bayesian testing of many hypothesis × many genes: A study of sleep apnea.” DOI: 10.1214/09-AOAS241SUPP.
  • Kendziorski, C. M., Newton, M. A., Lan, H. and Gould, M. N. (2003). On parametric empirical Bayes methods for comparing multiple groups using replicated gene expression pr. Stat. Med. 22 3899–3914.
  • Ma, P., Castillo-Davis, C., Zhong, W. and Liu, J. S. (2006). A data-driven clustering method for time course gene expression data. Nucleic Acids Research 34 1261–1269.
  • Medvedovic, M. and Sivaganesan, S. (2002). Bayesian infinite mixture models based clustering of gene expression profiles. Bioinformatics 18 1194–1206.
  • Newton, M., Kendziorski, C. M., Richmond, C. S., Blattner, F. R. and Tsui, K. W. (2001). On differential variability of expression ratios: Improving statistical inference about gene expression changes from microarray data. J. Comput. Biol. 8 37–52.
  • Newton, M., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 155–176.
  • Pack, A. I. (2006). Advances in sleep-disordered breathing. Am. J. Respir. Crit. Care. Med. 173 7–15.
  • Segal, E., Shapira, M., Regev, A., Pe’er, D., Botstein, D., Koller, D. and Friedman, N. (2003). Module networks: Identifying regulatory modules and their condition-specific regulators from gene expression data. Nature Genetics 34 166–176.
  • Smyth, G. (2004). Linear models and empirical bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3 3.
  • Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. 100 9440–9445.
  • Wu, Z., Irizarry, R. A., Gentleman, R., Murillo, F. M. and Spencer, F. (2004). A model based background adjustment for oligonucleotide expression arrays. Technical Report Paper 1, Dept. Biostatistics, Johns Hopkins Univ.
  • Yuan, M. and Kendziorski, C. (2006). A unified approach for simultaneous gene clustering and differential expression identification. Biometrics 62 1089–1098.

Supplemental materials