The Annals of Applied Statistics

Nonparametric Bayesian learning of heterogeneous dynamic transcription factor networks

Xiangyu Luo and Yingying Wei

Gene expression is largely controlled by transcription factors (TFs) in a collaborative manner. Therefore, an understanding of TF collaboration is crucial for the elucidation of gene regulation. The co-activation of TFs can be represented by networks. These networks are dynamic in diverse biological conditions and heterogeneous across the genome within each biological condition. Existing methods for construction of TF networks lack solid statistical models, analyze each biological condition separately, and enforce a single network for all genomic locations within one biological condition, resulting in low statistical power and misleading spurious associations. In this paper, we present a novel Bayesian nonparametric dynamic Poisson graphical model for inference on TF networks. Our approach automatically teases out genome heterogeneity and borrows information across conditions to improve signal detection from very few replicates, thus offering a valid and efficient measure of TF co-activations. We develop an efficient parallel Markov chain Monte Carlo algorithm for posterior computation. The proposed approach is applied to study TF associations in ENCODE cell lines and provides novel findings.

Article information

Ann. Appl. Stat., Volume 12, Number 3 (2018), 1749-1772.

Received: March 2017
Revised: November 2017
First available in Project Euclid: 11 September 2018

Poisson graphical model nonparametric Bayes parallel Markov chain Monte Carlo next generation sequencing


Luo, Xiangyu; Wei, Yingying. Nonparametric Bayesian learning of heterogeneous dynamic transcription factor networks. Ann. Appl. Stat. 12 (2018), no. 3, 1749--1772. doi:10.1214/17-AOAS1129.

Supplemental materials

  • Supplementary Materials to “Nonparametric Bayesian learning of heterogeneous dynamic transcription factor networks”. The zip file provides the supplementary details referenced in the main text, the C code that implements HDPGM, and the datasets used in the simulation study and the real application.