The Annals of Applied Statistics

An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling

Daniel Merl, Julia Ling-Yu Chen, Jen-Tsan Chi, and Mike West

Full-text: Open access


We present an applied study in cancer genomics for integrating data and inferences from laboratory experiments on cancer cell lines with observational data obtained from human breast cancer studies. The biological focus is on improving understanding of transcriptional responses of tumors to changes in the pH level of the cellular microenvironment. The statistical focus is on connecting experimentally defined biomarkers of such responses to clinical outcome in observational studies of breast cancer patients. Our analysis exemplifies a general strategy for accomplishing this kind of integration across contexts. The statistical methodologies employed here draw heavily on Bayesian sparse factor models for identifying, modularizing and correlating with clinical outcome these signatures of aggregate changes in gene expression. By projecting patterns of biological response linked to specific experimental interventions into observational studies where such responses may be evidenced via variation in gene expression across samples, we are able to define biomarkers of clinically relevant physiological states and outcomes that are rooted in the biology of the original experiment. Through this approach we identify microenvironment-related prognostic factors capable of predicting long term survival in two independent breast cancer datasets. These results suggest possible directions for future laboratory studies, as well as indicate the potential for therapeutic advances though targeted disruption of specific pathway components.

Article information

Ann. Appl. Stat., Volume 3, Number 4 (2009), 1675-1694.

First available in Project Euclid: 1 March 2010

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Acidosis and neutralization pathways in cancer Bayesian latent factor models breast cancer genomics gene expression signatures integrative cancer genomics micro-environmental parameters in cancer Weibull survival models


Merl, Daniel; Chen, Julia Ling-Yu; Chi, Jen-Tsan; West, Mike. An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling. Ann. Appl. Stat. 3 (2009), no. 4, 1675--1694. doi:10.1214/09-AOAS261.

Export citation


  • Carvalho, C., Chang, J., Lucas, J., Nevins, J., Wang, Q. and West, M. (2008). High-dimensional sparse factor modelling: Applications in gene expression genomics. J. Amer. Statist. Assoc. 103 1438–1456.
  • Chen, G., Gharib, T., Wang, H., Huang, C., Kuick, R., Thomas, D., Shedden, K., Misek, D., Taylor, J., Giordano, T., Kardia, S., Iannettoni, M., Yee, J., Hogg, P., Orringer, M., Hanash, S. and Beer, D. (2003). Protein profiles associated with survival in lung adenocarcinoma. Proc. Natl. Acad. Sci. 100 13537–13542.
  • Chen, J.L-Y., Lucas, J., Schroeder, T., Mori, S., Nevins, J., Dewhirst, M., West, M. and Chi, J. (2008). Genomic analysis of response to lactic acidosis and acidosis in human cancers. PLoS Genetics 4 e1000293.
  • Chi, J., Wang, Z., Nuyten, D., Rodriguez, E., Schaner, M., Salim, A., Wang, Y., Kristensen, G., Helland, A., Borresen-Dale, A., Giaccia, A., Longaker, M., Hastie, T., Yang, G., van de Vijer, M. and Brown, P. (2006). Gene expression programs in response to hypoxia: Cell type specificity and prognostic significance in human cancers. PLoS Medicine 3 e47.
  • Dressman, H., Hans, C., Bild, A., Olson, J., Rosen, E., Marcom, P., Liocheva, V., Jones, E., Vujaskovic, Z., Marks, J., Dewhirst, M., West, M., Nevins, J. and Blackwell, K. (2006). Gene expression profiles of multiple breast cancer phenotypes and response to neoadjuvant chemotherapy. Clinical Cancer Research 12 819–826.
  • Duan, Z., Lamendola, D., Yusuf, R., Penson, R., Preffer, F. and Seiden, M. (2002). Overexpression of human phosphoglycerate kinase 1 (pgk1) induces a multidrug resistance phenotype. Anticancer Research 22 1933–1941.
  • Hanahan, D. and Weinberg, R. (2000). The hallmarks of cancer. Cell 100 57–70.
  • Hans, C., Dobra, A. and West, M. (2007). Shotgun stochastic search in regression with many predictors. J. Amer. Statist. Assoc. 102 507–516.
  • Hans, C., Wang, Q., Dobra, A. and West, M. (2007b). SSS: High-dimensional Bayesian regression model search. Bulletin of the International Society for Bayesian Analysis 14 8–9.
  • Hwang, T., Liang, Y., Chien, K. and Yu, J. (2006). Overexpression and elevated serum levels of phosphoglycerate kinase 1 in pancreatic ductal adenocarcinoma. Proteomics 6 2259–2272.
  • Irizarry, R., Bolstad, B., Collin, F., Cope, L., Hobbs, B. and Speed, T. (2003). Summaries of affymetrix genechip probe level data. Nucleic Acids Research 31 e15.
  • Lay, A., Jiang, X., Kisker, O., Flynn, E., Underwood, A., Condron, R. and Hogg, P. (2000). Phosphoglycerate kinase acts in tumour angiogenesis as a disulphide reductase. Nature 408 869–873.
  • Lucas, J., Carvalho, C., Wang, Q., Bild, A., Nevins, J. and West, M. (2006). Sparse statistical modelling in gene expression genomics. In Bayesian Inference for Gene Expression and Proteomics (P. Müller, K. Do and M. Vannucci, eds.) 155–176. Cambridge Univ. Press.
  • Lucas, J., Carvalho, C. and West, M. (2009). A Bayesian analysis strategy for cross-study translation of gene expression biomarkers. Statist. Appl. Genet. Mol. Biol. 8 art11.
  • Lucas, J., Carvalho, C., Merl, D. and West, M. (2009). In-vitro to in-vivo factor profiling in expression genomics. In Bayesian Modeling in Bioinformatics (D. Dey, S. Ghosh and B. Mallick, eds.). Chapman & Hall/CRC. To appear.
  • Merl, D., Chen, J.L-Y., Chi, J. and West, M. (2009). Supplement to “An integrative analysis of cancer gene expression studies using Bayesian latent factor modeling.” DOI: 10.1214/09-AOAS261SUPPA, 10.1214/09-AOAS261SUPPB.
  • Miller, L., Smeds, J., George, J., Vega, V., Vergara, L., Ploner, A., Pawitan, Y., Hall, P., Klaar, S., Liu, E. and Bergh, J. (2005). An expression signature for p53 status in human breast cancer predicts mutation status, transcriptional effects, and patient survival. Proc. Natl. Acad. Sci. 102 13550–13555.
  • Pawitan, Y., Bjohle, J., Amler, L., Borg, A., Egyhazi, S., Hall, P., Han, X., Holmberg, L., Huang, F., Slaar, S., Liu, E., Miller, M., Nordgren, H., Ploner, A., Sandelin, K., Shaw, P., Smeds, J., Skoog, L., Wedren, S. and Bergh, J. (2005). Gene expression profiling spares early breast cancer patients from adjuvant therapy: Derived and validated in two population-based cohorts. Breast Cancer Research 7 R953–R964.
  • Pittman, J., Huang, E., Dressman, H., Horng, C., Cheng, S., Tsou, M., Chen, C., Bild, A., Iversen, E., Huang, A., Nevins, J. and West, M. (2004). Integrated modeling of clinical and gene expression information for personalized prediction of disease outcomes. Proc. Natl. Acad. Sci. 101 8431–8436.
  • Seo, D., Goldschmidt-Clermont, P. and West, M. (2007). Of mice and men: Sparse statistical modelling in cardiovascular genomics. Ann. Appl. Statist. 1 152–178.
  • Sotiriou, C., Wirapati, P., Loi, S., Harris, A., Fox, S., Smeds, J., Nordgren, H., Farmer, P., Praz, V., Haibe-Kains, B., Desmedt, C., Larsimont, D., Cardoso, F., Peterse, H., Nuyten, D., Buyse, M., Van de Vijver, M., Bergh, J., Piccart, M. and Delorenzi, M. (2006). Gene expression profiling in breast cancer: Understanding the molecular basis of histologic grade to improve prognosis. Journal of the National Cancer Institute 98 262–272.
  • Unwin, R., Craven, R., Harnden, P., Hanrahan, S., Totty, N., Knowles, M., Eardley, I., Selby, P. and Banks, R. (2003). Proteomic changes in renal cancer and co-ordinate demonstration of both the glycolytic and mitochondrial aspects of the warburg effect. Proteomics 3 1620–1632.
  • Wang, Q., Carvalho, C., Lucas, J. and West, M. (2007). BFRM: Bayesian factor regression modelling. Bulletin of the International Society for Bayesian Analysis 14 4–5.
  • West, M. (2003). Bayesian factor regression models in the “large p, small n” paradigm. In Bayesian Statistics 7 (J. Bernardo, M. Bayarri, J. Berger, A. Dawid, D. Heckerman, A. Smith and M. West, eds.) 723–732. Oxford Univ. Press.
  • West, M., Blanchette, C., Dressman, H., Huang, E., Ishida, S., Spang, R., Zuzan, H., Marks, J. and Nevins, J. (2001). Predicting the clinical status of human breast cancer utilizing gene expression profiles. Proc. Natl. Acad. Sci. 98 11462–11467.

Supplemental materials