A statistical model to assess (allele-specific) associations between gene expression and epigenetic features using sequencing data

Naim U. Rashid, Wei Sun, and Joseph G. Ibrahim

Sequencing techniques have been widely used to assess gene expression (i.e., RNA-seq) or the presence of epigenetic features (e.g., DNase-seq to identify open chromatin regions). In contrast to traditional microarray platforms, sequencing data are typically summarized in the form of discrete counts, and they are able to delineate allele-specific signals, which are not available from microarrays. The presence of epigenetic features are often associated with gene expression, both of which have been shown to be affected by DNA polymorphisms. However, joint models with the flexibility to assess interactions between gene expression, epigenetic features and DNA polymorphisms are currently lacking. In this paper, we develop a statistical model to assess the associations between gene expression and epigenetic features using sequencing data, while explicitly modeling the effects of DNA polymorphisms in either an allele-specific or nonallele-specific manner. We show that in doing so we provide the flexibility to detect associations between gene expression and epigenetic features, as well as conditional associations given DNA polymorphisms. We evaluate the performance of our method using simulations and apply our method to study the association between gene expression and the presence of DNase I Hypersensitive sites (DHSs) in HapMap individuals. Our model can be generalized to exploring the relationships between DNA polymorphisms and any two types of sequencing experiments, a useful feature as the variety of sequencing experiments continue to expand.

Ann. Appl. Stat. Volume 10, Number 4 (2016), 2254-2273.

Received: September 2014
Revised: July 2016
Bivariate binomial logistic-normal (BBLN) distribution bivariate Poisson log-normal (BPLN) distribution DNase-seq genetics genomics RNA-seq


Supplemental materials

  • Supplement to “A Statistical model to assess (allele-specific) associations between gene expression and epigenetic features using sequencing data”. Contains details on numerical maximization procedures for the BBLN and BPLN models.