Coupling hidden Markov models for the discovery of Cis-regulatory modules in multiple species



The Annals of Applied Statistics

Coupling hidden Markov models for the discovery of Cis-regulatory modules in multiple species

Qing Zhou and Wing Hung Wong

Source: Ann. Appl. Stat. Volume 1, Number 1 (2007), 36-65.

Abstract

Cis-regulatory modules (CRMs) composed of multiple transcription factor binding sites (TFBSs) control gene expression in eukaryotic genomes. Comparative genomic studies have shown that these regulatory elements are more conserved across species due to evolutionary constraints. We propose a statistical method to combine module structure and cross-species orthology in de novo motif discovery. We use a hidden Markov model (HMM) to capture the module structure in each species and couple these HMMs through multiple-species alignment. Evolutionary models are incorporated to consider correlated structures among aligned sequence positions across different species. Based on our model, we develop a Markov chain Monte Carlo approach, MultiModule, to discover CRMs and their component motifs simultaneously in groups of orthologous sequences from multiple species. Our method is tested on both simulated and biological data sets in mammals and Drosophila, where significant improvement over other motif and module discovery methods is observed.

Related Works:

Keywords: Cis-regulatory module; motif discovery; comparative genomics; coupled hidden Markov model; Markov chain Monte Carlo; dynamic programming

Full-text: Access denied (no subscription detected)

In 2007, access to the Annals of Applied Statistics was open. Beginning in 2008, you must hold a subscription or be a member of the IMS to view the full journal. For more information on subscribing, please visit: http://imstat.org/orders.
If you are already an IMS member, you may need to update your Euclid profile following the instructions here: http://imstat.org/publications/eaccess.htm.
Links and Identifiers

Permanent link to this document: http://projecteuclid.org/euclid.aoas/1183143728
Digital Object Identifier: doi:10.1214/07-AOAS103
Mathematical Reviews number (MathSciNet): MR2393840


2009 © Institute of Mathematical Statistics