The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 9, Number 1 (2015), 225-246.
A multi-functional analyzer uses parameter constraints to improve the efficiency of model-based gene-set analysis
We develop a model-based methodology for integrating gene-set information with an experimentally-derived gene list. The methodology uses a previously reported sampling model, but takes advantage of natural constraints in the high-dimensional discrete parameter space in order to work from a more structured prior distribution than is currently available. We show how the natural constraints are expressed in terms of linear inequality constraints within a set of binary latent variables. Further, the currently available prior gives low probability to these constraints in complex systems, such as Gene Ontology (GO), thus reducing the efficiency of statistical inference. We develop two computational advances to enable posterior inference within the constrained parameter space: one using integer linear programming for optimization and one using a penalized Markov chain sampler. Numerical experiments demonstrate the utility of the new methodology for a multivariate integration of genomic data with GO or related information systems. Compared to available methods, the proposed multi-functional analyzer covers more reported genes without mis-covering nonreported genes, as demonstrated on genome-wide data from association studies of type 2 diabetes and from RNA interference studies of influenza.
Ann. Appl. Stat., Volume 9, Number 1 (2015), 225-246.
First available in Project Euclid: 28 April 2015
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Zentralblatt MATH identifier
Wang, Zhishi; He, Qiuling; Larget, Bret; Newton, Michael A. A multi-functional analyzer uses parameter constraints to improve the efficiency of model-based gene-set analysis. Ann. Appl. Stat. 9 (2015), no. 1, 225--246. doi:10.1214/14-AOAS777. https://projecteuclid.org/euclid.aoas/1430226091
- More on role modeling.: We provide further details on violation probabilities, on estimating false-positive and true-positive error rates, on preparing data for the ILP algorithm, and on further data analysis findings in the T2D and RNAi examples.