Open Access
Translator Disclaimer
March 2008 Transcription factor binding site prediction with multivariate gene expression data
Nancy R. Zhang, Mary C. Wildermuth, Terence P. Speed
Ann. Appl. Stat. 2(1): 332-365 (March 2008). DOI: 10.1214/10.1214/07-AOAS142


Multi-sample microarray experiments have become a standard experimental method for studying biological systems. A frequent goal in such studies is to unravel the regulatory relationships between genes. During the last few years, regression models have been proposed for the de novo discovery of cis-acting regulatory sequences using gene expression data. However, when applied to multi-sample experiments, existing regression based methods model each individual sample separately. To better capture the dynamic relationships in multi-sample microarray experiments, we propose a flexible method for the joint modeling of promoter sequence and multivariate expression data.

In higher order eukaryotic genomes expression regulation usually involves combinatorial interaction between several transcription factors. Experiments have shown that spacing between transcription factor binding sites can significantly affect their strength in activating gene expression. We propose an adaptive model building procedure to capture such spacing dependent cis-acting regulatory modules.

We apply our methods to the analysis of microarray time-course experiments in yeast and in Arabidopsis. These experiments exhibit very different dynamic temporal relationships. For both data sets, we have found all of the well-known cis-acting regulatory elements in the related context, as well as being able to predict novel elements.


Download Citation

Nancy R. Zhang. Mary C. Wildermuth. Terence P. Speed. "Transcription factor binding site prediction with multivariate gene expression data." Ann. Appl. Stat. 2 (1) 332 - 365, March 2008.


Published: March 2008
First available in Project Euclid: 24 March 2008

zbMATH: 1137.62083
MathSciNet: MR2415606
Digital Object Identifier: 10.1214/10.1214/07-AOAS142

Keywords: DNA motifs , gene expression , linear models , Multivariate analysis , transcription regulation

Rights: Copyright © 2008 Institute of Mathematical Statistics


Vol.2 • No. 1 • March 2008
Back to Top