The Annals of Applied Statistics

A Bayesian model averaging approach for observational gene expression studies

Xi Kathy Zhou, Fei Liu, and Andrew J. Dannenberg
Source: Ann. Appl. Stat. Volume 6, Number 2 (2012), 497-520.

Abstract

Identifying differentially expressed (DE) genes associated with a sample characteristic is the primary objective of many microarray studies. As more and more studies are carried out with observational rather than well controlled experimental samples, it becomes important to evaluate and properly control the impact of sample heterogeneity on DE gene finding. Typical methods for identifying DE genes require ranking all the genes according to a preselected statistic based on a single model for two or more group comparisons, with or without adjustment for other covariates. Such single model approaches unavoidably result in model misspecification, which can lead to increased error due to bias for some genes and reduced efficiency for the others. We evaluated the impact of model misspecification from such approaches on detecting DE genes and identified parameters that affect the magnitude of impact. To properly control for sample heterogeneity and to provide a flexible and coherent framework for identifying simultaneously DE genes associated with a single or multiple sample characteristics and/or their interactions, we proposed a Bayesian model averaging approach which corrects the model misspecification by averaging over model space formed by all relevant covariates. An empirical approach is suggested for specifying prior model probabilities. We demonstrated through simulated microarray data that this approach resulted in improved performance in DE gene identification compared to the single model approaches. The flexibility of this approach is demonstrated through our analysis of data from two observational microarray studies.

First Page: Show Hide

Related Works:

Full-text: Access denied (no subscription detected)
In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1339419605
Digital Object Identifier: doi:10.1214/11-AOAS526
Zentralblatt MATH identifier: 06062728
Mathematical Reviews number (MathSciNet): MR2976480

References

Boyle, J. O., Gumus, Z. H., Kacker, A., Choksi, V. L., Jennifer, M. B., Zhou, X. K., Ante’s, R. K., Hughes, D., Du, B., Judson, B. L., Subbaramaiah, K. and Dannenberg, A. J. (2010). Effects of cigarette smoke on the human oral mucosal transcritpome. Cancer Prevention Reseach 3 266–278.
Cao, J. and Zhang, S. (2010). Measuring statistical significance for full Bayesian methods in microarray analyses. Bayesian Anal. 5 413–427.
Mathematical Reviews (MathSciNet): MR2719658
Digital Object Identifier: doi:10.1214/10-BA608
Cao, J., Xie, X.-J., Zhang, S., Whitehurst, A. and White, M. A. (2009). Bayesian optimal discovery procedure for simultaneous significance testing. BMC Bioinformatics 10 5.
Carolan, B. J., Harvey, B. G., De Bishnu, P., Vanni, H. and Crystal, R. G. (2008). Decreased expression of Intelectin 1 in the human airway epithelium of smokers compared to nonsmokers. Journal of Immunology 181 5760–5767.
Conlon, E. M., Song, J. J. and Liu, J. S. (2006). Bayesian models for pooling microarray studies with multiple sources of replications. BMC Bioinformatics 7 247.
Delongchamp, R. R., Velasco, C., Dial, S. and Harris, A. J. (2005). Genome-wide estimation of gender differences in the gene expression of human livers: Statistical design and analysis. BMC Bioinformatics 6 Suppl 2 S13.
Efron, B. (2008). Microarrays, empirical Bayes and the two-groups model. Statist. Sci. 23 1–22.
Mathematical Reviews (MathSciNet): MR2431866
Digital Object Identifier: doi:10.1214/07-STS236
Project Euclid: euclid.ss/1215441276
Efron, B. (2010). Sets of cases (Enrichment). In Large-Scale Inference: Empirical Bayes Methods for Estimation, Testing, and Prediction. Institute of Mathematical Statistics Monographs 1 163–184. Cambridge Univ. Press, Cambridge.
Mathematical Reviews (MathSciNet): MR2724758
Efron, B. and Tibshirani, R. (2007). On testing the significance of sets of genes. Ann. Appl. Stat. 1 107–129.
Mathematical Reviews (MathSciNet): MR2393843
Zentralblatt MATH: 1129.62102
Digital Object Identifier: doi:10.1214/07-AOAS101
Project Euclid: euclid.aoas/1183143731
Gottardo, R. and Raftery, A. (2009). Bayesian robust transformation and variable selection: A unified approach. Canad. J. Statist. 37 361–380.
Mathematical Reviews (MathSciNet): MR2547204
Digital Object Identifier: doi:10.1002/cjs.10021
Heller, R., Manduchi, E. and Small, D. S. (2009). Matching methods for observational microarray studies. Bioinformatics 25 904–909.
Hoeting, J. A., Madigan, D., Raftery, A. E. and Volinsky, C. T. (1999). Bayesian model averaging: A tutorial. Statist. Sci. 14 382–417.
Mathematical Reviews (MathSciNet): MR1765176
Digital Object Identifier: doi:10.1214/ss/1009212519
Project Euclid: euclid.ss/1009212519
Hummel, M., Meister, R. and Mansmann, U. (2008). GlobalANCOVA: Exploration and assessment of gene group effects. Bioinformatics 24 78–85.
Jeffery, I. B., Higgins, D. G. and Culhane, A. C. (2006). Comparison and evaluation of methods for generating differentially expressed gene lists from microarray data. BMC Bioinformatics 7 359.
Kass, R. E. and Raftery, A. E. (1995). Bayes factors. J. Amer. Statist. Assoc. 90 773–795.
Leek, J. T. and Storey, J. D. (2007). Capturing heterogeneity in gene expression studies by surrogate variable analysis. PLoS Genet. 3 1724–1735.
Lewohl, J. M., Dodd, P. R., Mayfield, R. D. and Harris, R. A. (2001). Application of DNA microarrays to study human alcoholism. J. Biomed. Sci. 8 28–36.
Liang, F., Paulo, R., Molina, G., Clyde, M. A. and Berger, J. O. (2008). Mixtures of $g$ priors for Bayesian variable selection. J. Amer. Statist. Assoc. 103 410–423.
Mathematical Reviews (MathSciNet): MR2420243
Zentralblatt MATH: 05564499
Digital Object Identifier: doi:10.1198/016214507000001337
Müller, P., Parmigiani, G. and Rice, K. (2007). FDR and Bayesian multiple comparisons rules. In Bayesian Statistics 8 (J. M. Bernardo, M. Bayarri, J. Berger, et al., eds.). 349–370. Oxford Univ. Press, Oxford.
Mathematical Reviews (MathSciNet): MR2433200
Zentralblatt MATH: 1252.62025
Newton, M. A., Noueiry, A., Sarkar, D. and Ahlquist, P. (2004). Detecting differential gene expression with a semiparametric hierarchical mixture method. Biostatistics 5 155–176.
Potter, J. D. (2003). Epidemiology, cancer genetics and microarrays: Making correct inferences, using appropriate designs. Trends Genet. 19 690–695.
Rao, P. (1971). Some notes on misspecification in multiple regressions. Amer. Statist. 25 37–39.
Rao, P. (1973). Some notes on the errors-in-variables model. Amer. Statist. 27 217–218.
Mathematical Reviews (MathSciNet): MR348931
Rosenberg, S. H. and Levy, P. S. (1972). A characterization on misspecification in the general linear regression model. Biometrics 28 1129–1133.
Mathematical Reviews (MathSciNet): MR329145
Digital Object Identifier: doi:10.2307/2528646
Sartor, M. A., Tomlinson, C. R., Wesselkamper, S. C., Sivaganesan, S., Leikauf, G. D. and Medvedovic, M. (2006). Intensity-based hierarchical Bayes method improves testing for differentially expressed genes in microarray experiments. BMC Bioinformatics 7 538.
Scheid, S. and Spang, R. (2007). Compensating for unknown confounders in microarray data analysis using filtered permutations. J. Comput. Biol. 14 669–681.
Scott, J. G. and Berger, J. O. (2010). Bayes and empirical-Bayes multiplicity adjustment in the variable-selection problem. Ann. Statist. 38 2587–2619.
Mathematical Reviews (MathSciNet): MR2722450
Zentralblatt MATH: 1200.62020
Digital Object Identifier: doi:10.1214/10-AOS792
Project Euclid: euclid.aos/1278861454
Sebastiani, P., Xie, H. and Ramoni, M. F. (2006). Bayesian analysis of comparative microarray experiments by model averaging. Bayesian Anal. 1 707–732.
Mathematical Reviews (MathSciNet): MR2282204
Digital Object Identifier: doi:10.1214/06-BA123
Smyth, G. K. (2004). Linear models and empirical Bayes methods for assessing differential expression in microarray experiments. Stat. Appl. Genet. Mol. Biol. 3 Art. 3, 29 pp. (electronic).
Mathematical Reviews (MathSciNet): MR2101454
Zentralblatt MATH: 1038.62110
Digital Object Identifier: doi:10.2202/1544-6115.1027
Spira, A., Beane, J., Shah, V., Liu, G., Schembri, F., Yang, X., Palma, J. and Brody, J. S. (2004). Effects of cigarette smoke on the human airway epithelial cell transcriptome. Proc. Natl. Acad. Sci. USA 101 10143–10148.
Storey, J. D. (2002). A direct approach to false discovery rates. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 479–498.
Mathematical Reviews (MathSciNet): MR1924302
Zentralblatt MATH: 1090.62073
Digital Object Identifier: doi:10.1111/1467-9868.00346
Storey, J. D. and Tibshirani, R. (2003). Statistical significance for genomewide studies. Proc. Natl. Acad. Sci. USA 100 9440–9445 (electronic).
Mathematical Reviews (MathSciNet): MR1994856
Zentralblatt MATH: 1130.62385
Digital Object Identifier: doi:10.1073/pnas.1530509100
Tan, Q., Zhao, J., Li, S., Christiansen, L., Kruse, T. A. and Christensen, K. (2008). Differential and correlation analyses of microarray gene expression data in the CEPH Utah families. Genomics 92 94–100.
Troester, M. A., Millikan, R. C. and Perou, C. M. (2009). Microarrays and epidemiology: Ensuring the impact and accessibility of research findings. Cancer Epidemiology, Biomarkers & Prevention 18 1–4.
Mathematical Reviews (MathSciNet): MR2750557
Digital Object Identifier: doi:10.1177/0962280209352042
Webb, P. M., Merritt, M. A., Boyle, G. M. and Green, A. C. (2007). Microarrays and epidemiology: Not the beginning of the end but the end of the beginning. Cancer Epidemiology, Biomarkers & Prevention 16 637–638.
Wu, X. L., Gianola, D., Rosa, G. J. M. and Weigel, K. A. (2010). Bayesian model averaging for evaluation of candidate gene effects. Genetica 138 395–407.
Xu, L., Craiu, R. V. and Sun, L. (2011). Bayesian methods to overcome the winner’s curse in genetic studies. Ann. Appl. Stat. 5 201–231.
Mathematical Reviews (MathSciNet): MR2810395
Zentralblatt MATH: 1220.62027
Digital Object Identifier: doi:10.1214/10-AOAS373
Project Euclid: euclid.aoas/1300715188
Yang, X., Schadt, E. E., Wang, S., Wang, H., Arnold, A. P., Ingram-Drake, L., Drake, T. A. and Lusis, A. J. (2006). Tissue-specific expression and regulation of sexually dimorphic genes in mice. Genome Res. 16 995–1004.
Yeung, K. Y., Bumgarner, R. E. and Raftery, A. E. (2005). Bayesian model averaging: Development of an improved multi-class, gene selection and classification tool for microarray data. Bioinformatics 21 2394–2402.
Zellner, A. and Siow, A. (1980). Posterior odds ratios for selected regression hypotheses. In Bayesian Statistics: Proceedings of the First International Meeting Held in Valencia (Spain) (J. M. Bernardo, M. H. DeGroot, D. V. Lindley and A. F. M. Smith, eds.) 585–603. Valencia Univ. Press, Valencia.
Mathematical Reviews (MathSciNet): MR638871
Zhou, X. K., Liu, F. and Dannenberg, A. J. (2012). Supplement to “A Bayesian model averaging approach for observational gene expression studies.” DOI:10.1214/11-AOAS526SUPP.

2013 © Institute of Mathematical Statistics

The Annals of Applied Statistics

The Annals of Applied Statistics

Turn MathJax Off
What is MathJax?