The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 13, Number 1 (2019), 606-637.
Bayesian hidden Markov tree models for clustering genes with shared evolutionary history
Determination of functions for poorly characterized genes is crucial for understanding biological processes and studying human diseases. Functionally associated genes are often gained and lost together through evolution. Therefore identifying co-evolution of genes can predict functional gene-gene associations. We describe here the full statistical model and computational strategies underlying the original algorithm CLustering by Inferred Models of Evolution (CLIME 1.0) recently reported by us (Cell 158 (2014) 213–225). CLIME 1.0 employs a mixture of tree-structured hidden Markov models for gene evolution process, and a Bayesian model-based clustering algorithm to detect gene modules with shared evolutionary histories (termed evolutionary conserved modules, or ECMs). A Dirichlet process prior was adopted for estimating the number of gene clusters and a Gibbs sampler was developed for posterior sampling. We further developed an extended version, CLIME 1.1, to incorporate the uncertainty on the evolutionary tree structure. By simulation studies and benchmarks on real data sets, we show that CLIME 1.0 and CLIME 1.1 outperform traditional methods that use simple metrics (e.g., the Hamming distance or Pearson correlation) to measure co-evolution between pairs of genes.
Ann. Appl. Stat., Volume 13, Number 1 (2019), 606-637.
Received: June 2018
Revised: August 2018
First available in Project Euclid: 10 April 2019
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Li, Yang; Ning, Shaoyang; Calvo, Sarah E.; Mootha, Vamsi K.; Liu, Jun S. Bayesian hidden Markov tree models for clustering genes with shared evolutionary history. Ann. Appl. Stat. 13 (2019), no. 1, 606--637. doi:10.1214/18-AOAS1208. https://projecteuclid.org/euclid.aoas/1554861662