Bayesian factor models for the detection of coherent patterns in gene expression data

Vinicius D. Mayrink; Joseph E. Lucas

doi:10.1214/13-BJPS226

February 2015 Bayesian factor models for the detection of coherent patterns in gene expression data

Vinicius D. Mayrink, Joseph E. Lucas

Braz. J. Probab. Stat. 29(1): 1-33 (February 2015). DOI: 10.1214/13-BJPS226

Abstract

A common problem in the analysis of gene expression microarray data is the identification of groups of features that are coherently expressed. For example, one often wishes to know whether a group of genes, clustered because of correlation in one data set, are still highly co-expressed in another data set. Alternatively, for some expression array platforms there are many, relatively short probes for each gene of interest. In this case, it is possible that a given probe is not measuring its targeted gene, but rather a different gene with a similar region (called cross-hybridization). Accurate detection of the collection of probe sets (groups of probes targeting the same gene) which demonstrate highly coherent expression patterns is the best approach to the identification of which genes are present in the sample. We develop a Bayesian Factor Model (BFM) to address the general problem of detection of coherent patterns in gene expression data sets. We compare our method to “state of the art” methods for the identification of expressed genes in both synthetic and real data sets, and the results indicate that the BFM outperforms the other procedures for detecting transcripts. We also demonstrate the use of factor analysis to identify the presence/absence status of gene modules (groups of coherently expressed genes). Variation in the number of copies of regions of the genome is a well known and important feature of most cancers. We examine a group of genes, representative of Copy Number Alteration (CNA) in breast cancer, then identify the presence/absence of CNA in this region of the genome for other cancers. Coherent patterns can also be evaluated in high-throughput sequencing data, a novel technology to measure gene expression. We analyze this type of data via factor model and examine the detection calls in terms of read mapping uncertainty.

Citation

Download Citation

Vinicius D. Mayrink. Joseph E. Lucas. "Bayesian factor models for the detection of coherent patterns in gene expression data." Braz. J. Probab. Stat. 29 (1) 1 - 33, February 2015. https://doi.org/10.1214/13-BJPS226