The Annals of Applied Statistics

A Bayesian nonparametric mixture model for selecting genes and gene subnetworks

Yize Zhao, Jian Kang, and Tianwei Yu

Full-text: Open access


It is very challenging to select informative features from tens of thousands of measured features in high-throughput data analysis. Recently, several parametric/regression models have been developed utilizing the gene network information to select genes or pathways strongly associated with a clinical/biological outcome. Alternatively, in this paper, we propose a nonparametric Bayesian model for gene selection incorporating network information. In addition to identifying genes that have a strong association with a clinical outcome, our model can select genes with particular expressional behavior, in which case the regression models are not directly applicable. We show that our proposed model is equivalent to an infinity mixture model for which we develop a posterior computation algorithm based on Markov chain Monte Carlo (MCMC) methods. We also propose two fast computing algorithms that approximate the posterior simulation with good accuracy but relatively low computational cost. We illustrate our methods on simulation studies and the analysis of Spellman yeast cell cycle microarray data.

Article information

Ann. Appl. Stat., Volume 8, Number 2 (2014), 999-1021.

First available in Project Euclid: 1 July 2014

Permanent link to this document

Digital Object Identifier

Mathematical Reviews number (MathSciNet)

Zentralblatt MATH identifier

Dirichlet process mixture ising priors density estimation feature selection microarray data


Zhao, Yize; Kang, Jian; Yu, Tianwei. A Bayesian nonparametric mixture model for selecting genes and gene subnetworks. Ann. Appl. Stat. 8 (2014), no. 2, 999--1021. doi:10.1214/14-AOAS719.

Export citation


Supplemental materials

  • Supplementary material: Supplement to “A Bayesian nonparametric mixture model for selecting genes and gene subnetworks”. In this online supplemental article we provide (A) derivations of the proposed methods, (B) details of the main algorithms for posterior computations, (C) details of posterior inference for hyperparameters, (D) additional simulation studies and (E) sensitivity analysis.