The Annals of Applied Statistics

Modeling dependent gene expression

Donatello Telesca, Peter Müller, Giovanni Parmigiani, and Ralph S. Freedman
Source: Ann. Appl. Stat. Volume 6, Number 2 (2012), 542-560.

Abstract

In this paper we propose a Bayesian approach for inference about dependence of high throughput gene expression. Our goals are to use prior knowledge about pathways to anchor inference about dependence among genes; to account for this dependence while making inferences about differences in mean expression across phenotypes; and to explore differences in the dependence itself across phenotypes. Useful features of the proposed approach are a model-based parsimonious representation of expression as an ordinal outcome, a novel and flexible representation of prior information on the nature of dependencies, and the use of a coherent probability model over both the structure and strength of the dependencies of interest. We evaluate our approach through simulations and in the analysis of data on expression of genes in the Complement and Coagulation Cascade pathway in ovarian cancer.

First Page: Show Hide

Related Works:

Full-text: Access denied (no subscription detected)
In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1339419607
Digital Object Identifier: doi:10.1214/11-AOAS525
Zentralblatt MATH identifier: 06062730
Mathematical Reviews number (MathSciNet): MR2976482

References

Albert, J. H. and Chib, S. (1993). Bayesian analysis of binary and polychotomous response data. J. Amer. Statist. Assoc. 88 669–679.
Mathematical Reviews (MathSciNet): MR1224394
Zentralblatt MATH: 0774.62031
Digital Object Identifier: doi:10.1080/01621459.1993.10476321
Beal, M., Falciani, F., Ghahramani, Z., Rangel, C. and Wild, D. (2005). A Bayesian approach to reconstructing genetic regulatory networks with hidden factors. Bioinformatics 21 349–356.
Benjamini, Y. and Hochberg, Y. (1995). Controlling the false discovery rate: A practical and powerful approach to multiple testing. J. Roy. Statist. Soc. Ser. B 57 289–300.
Mathematical Reviews (MathSciNet): MR1325392
Besag, J. (1974). Spatial interaction and the statistical analysis of lattice systems. J. Roy. Statist. Soc. Ser. B 36 192–236.
Mathematical Reviews (MathSciNet): MR373208
Brat, D. J., Bellail, A. C. and Erwin, G. V. M. (2005). The role of interlukin-8 and its receptors in gliomagenesis and tumoral angiogenesis. Neuro-Oncology 7 122–133.
Braun, R., Cope, L. and Parmigiani, G. (2008). Identigying differential correlation in gene/pathway combinations. BMC Bioinformatics 9 488.
Broët, P. and Richardson, S. (2006). Detection of gene copy number changes in CGH microarrays using a spatially correlated mixture model. Bioinformatics 22 911–918.
Brown, P. J., Vannucci, M. and Fearn, T. (1998). Multivariate Bayesian variable selection and prediction. J. R. Stat. Soc. Ser. B Stat. Methodol. 60 627–641.
Mathematical Reviews (MathSciNet): MR1626005
Zentralblatt MATH: 0909.62022
Digital Object Identifier: doi:10.1111/1467-9868.00144
Carvalho, C. M. and Scott, J. G. (2009). Objective Bayesian model selection in Gaussian graphical models. Biometrika 96 497–512.
Mathematical Reviews (MathSciNet): MR2538753
Zentralblatt MATH: 1170.62020
Digital Object Identifier: doi:10.1093/biomet/asp017
Dawid, A. P. and Lauritzen, S. L. (1993). Hyper-Markov laws in the statistical analysis of decomposable graphical models. Ann. Statist. 21 1272–1317.
Mathematical Reviews (MathSciNet): MR1241267
Zentralblatt MATH: 0815.62038
Digital Object Identifier: doi:10.1214/aos/1176349260
Project Euclid: euclid.aos/1176349260
Dobra, A., Hans, C., Jones, B., Nevins, J. R., Yao, G. and West, M. (2004). Sparse graphical models for exploring gene expression data. J. Multivariate Anal. 90 196–212.
Mathematical Reviews (MathSciNet): MR2064941
Zentralblatt MATH: 1047.62104
Digital Object Identifier: doi:10.1016/j.jmva.2004.02.009
Drton, M. and Perlman, M. D. (2007). Multiple testing and error control in Gaussian graphical model selection. Statist. Sci. 22 430–449.
Mathematical Reviews (MathSciNet): MR2416818
Digital Object Identifier: doi:10.1214/088342307000000113
Project Euclid: euclid.ss/1199285042
Friedman, N., Linial, M., Nachman, I. and Pe‘er, D. (2000). Using Bayesian networks to analyze expression data. J. Comput. Biol. 7 601–620.
Fröhlich, H., Speer, N., Poutska, A. and Beibart, T. (2007). GOSim–an R-package for computation of theoretic GO similarities between terms and ene products. BMC Bioinformatics 8 166.
Garrett, E. S. and Parmigiani, G. (2004). A nested unsupervised approach to identifying novel molecular subtypes. Bernoulli 10 951–969.
Mathematical Reviews (MathSciNet): MR2108038
Digital Object Identifier: doi:10.3150/bj/1106314845
Project Euclid: euclid.bj/1106314845
George, E. I. and McCulloch, R. E. (1993). Variable selection via Gibbs sampling. J. Amer. Statist. Assoc. 88 881–889.
Giudici, P. and Green, P. J. (1999). Decomposable graphical Gaussian model determination. Biometrika 86 785–801.
Mathematical Reviews (MathSciNet): MR1741977
Zentralblatt MATH: 0940.62019
Digital Object Identifier: doi:10.1093/biomet/86.4.785
Green, P. J. (1995). Reversible jump Markov chain Monte Carlo computation and Bayesian model determination. Biometrika 82 711–732.
Mathematical Reviews (MathSciNet): MR1380810
Zentralblatt MATH: 0861.62023
Digital Object Identifier: doi:10.1093/biomet/82.4.711
Hoffman, R. and Valencia, A. (2004). A gene network for navigating the literature. Nature Genetics 36 664–664.
Jones, B., Carvalho, C., Dobra, A., Hans, C., Carter, C. and West, M. (2005). Experiments in stochastic computation for high-dimensional graphical models. Statist. Sci. 20 388–400.
Mathematical Reviews (MathSciNet): MR2210226
Digital Object Identifier: doi:10.1214/088342305000000304
Project Euclid: euclid.ss/1137076659
Kolaczyk, E. D. (2009). Statistical Analysis of Network Data: Methods and Models. Springer, New York.
Mathematical Reviews (MathSciNet): MR2724362
Zentralblatt MATH: 05500989
Koster, J. T. A. (1996). Markov properties of nonrecursive causal models. Ann. Statist. 24 2148–2177.
Mathematical Reviews (MathSciNet): MR1421166
Zentralblatt MATH: 0867.62056
Digital Object Identifier: doi:10.1214/aos/1069362315
Project Euclid: euclid.aos/1069362315
Lauritzen, S. L. (1996). Graphical Models. Oxford Statistical Science Series 17. Clarendon Press, New York.
Mathematical Reviews (MathSciNet): MR1419991
Markiewski, M. M. and Lambris, J. D. (2009). Unwelcome complement. Cancer Research 69 6367.
Meinshausen, N. and Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. Ann. Statist. 34 1436–1462.
Mathematical Reviews (MathSciNet): MR2278363
Zentralblatt MATH: 1113.62082
Digital Object Identifier: doi:10.1214/009053606000000281
Project Euclid: euclid.aos/1152540754
Mistry, M. and Pavlidis, P. (2008). Gene ontology term overlap as a measure of gene functional similarity. BMC Bioinformatics 9 327.
Mukherjee, S. and Speed, T. P. (2008). Netrwork inference using informative priors. PNAS 105 14133–14318.
Murphy, K. and Mian, S. (1999). Modeling gene expression data using dynamic Bayesian networksayesian networks. Technical report, Computer Science Division, Univ. California, Berkley.
Ong, I., Glasner, J. and Page, D. (2002). Modelling regulatory pathways in e.coli from time series expression profiles. Bioinformatics 18 S241–S248.
Parmigiani, G., Garrett, E. S., Anbazhagan, R. and Gabrielson, E. (2002). A statistical framework for expression-based molecular classification in cancer. J. R. Stat. Soc. Ser. B Stat. Methodol. 64 717–736.
Mathematical Reviews (MathSciNet): MR1979385
Zentralblatt MATH: 1067.62117
Digital Object Identifier: doi:10.1111/1467-9868.00358
Roach, L. E., Petrik, J. J., Plante, L. and LaMarre, J. (2002). Thrombin generation and presence of thrombin in ovarian follicles. Biology of Reproduction 66 1350–1358.
Ronning, G. and Kukuk, M. (1996). Efficient estimation of ordered probit models. J. Amer. Statist. Assoc. 91 1120–1129.
Mathematical Reviews (MathSciNet): MR1424612
Zentralblatt MATH: 0880.62122
Digital Object Identifier: doi:10.1080/01621459.1996.10476982
Sabidussi, G. (1966). The centrality index of a graph. Psychometrika 31 581–603.
Mathematical Reviews (MathSciNet): MR205879
Digital Object Identifier: doi:10.1007/BF02289527
Scott, J. G. and Berger, J. O. (2006). An exploration of aspects of Bayesian multiple testing. J. Statist. Plann. Inference 136 2144–2162.
Mathematical Reviews (MathSciNet): MR2235051
Zentralblatt MATH: 1087.62039
Digital Object Identifier: doi:10.1016/j.jspi.2005.08.031
Scott, J. G. and Carvalho, C. M. (2008). Feature-inclusion stochastic search for Gaussian graphical models. J. Comput. Graph. Statist. 17 790–808.
Mathematical Reviews (MathSciNet): MR2649067
Digital Object Identifier: doi:10.1198/106186008X382683
Sebastiani, P. and Ramoni, M. (2005). Normative selection of Bayesian networks. J. Multivariate Anal. 93 340–357.
Mathematical Reviews (MathSciNet): MR2162642
Zentralblatt MATH: 1066.62011
Digital Object Identifier: doi:10.1016/j.jmva.2004.03.005
Spirtes, P., Richardson, T. S., Meek, C., Scheines, R. and Glymour, C. (1998). Using path diagrams as a structural equation modeling tool. Sociol. Methods Res. 27 182–225.
Telesca, D., Müller, P., Parmigiani, G. and Freedman, R. S. (2011). Supplement to “Modeling dependent gene expression.” DOI:10.1214/11-AOAS525SUPP.
Terranova, P. F. and Rice, V. M. (1997). Review: Cytokine involvement in ovarian processes. American Journal of Reproductive Immunology 37 50–63.
Wang, X., Wang, E. and Kavanagh, J. (2005). Ovarian cancer, the coagulation pathway, and inflammation. Journal of Translational Medicine 3 25.
Wei, Z. and Li, H. (2007). A Markov random field model for network–based analysis of genomic data. Bioinformatics 23 1357–1544.
Wei, Z. and Li, H. (2008). A hidden spatial-temporal Markov random field model for network-based analysis of time course gene expression data. Ann. Appl. Stat. 2 408–429.
Mathematical Reviews (MathSciNet): MR2415609
Zentralblatt MATH: 1137.62081
Digital Object Identifier: doi:10.1214/07--AOAS145
Project Euclid: euclid.aoas/1206367827

2013 © Institute of Mathematical Statistics

The Annals of Applied Statistics

The Annals of Applied Statistics

Turn MathJax Off
What is MathJax?