Open Access
December 2019 Bayesian indicator variable selection to incorporate hierarchical overlapping group structure in multi-omics applications
Li Zhu, Zhiguang Huo, Tianzhou Ma, Steffi Oesterreich, George C. Tseng
Ann. Appl. Stat. 13(4): 2611-2636 (December 2019). DOI: 10.1214/19-AOAS1271

Abstract

Variable selection is a pervasive problem in modern high-dimensional data analysis where the number of features often exceeds the sample size (a.k.a. small-n-large-p problem). Incorporation of group structure knowledge to improve variable selection has been widely studied. Here, we consider prior knowledge of a hierarchical overlapping group structure to improve variable selection in regression setting. In genomics applications, for instance, a biological pathway contains tens to hundreds of genes and a gene can be mapped to multiple experimentally measured features (such as its mRNA expression, copy number variation and methylation levels of possibly multiple sites). In addition to the hierarchical structure, the groups at the same level may overlap (e.g., two pathways can share common genes). Incorporating such hierarchical overlapping groups in traditional penalized regression setting remains a difficult optimization problem. Alternatively, we propose a Bayesian indicator model that can elegantly serve the purpose. We evaluate the model in simulations and two breast cancer examples, and demonstrate its superior performance over existing models. The result not only enhances prediction accuracy but also improves variable selection and model interpretation that lead to deeper biological insight of the disease.

Citation

Download Citation

Li Zhu. Zhiguang Huo. Tianzhou Ma. Steffi Oesterreich. George C. Tseng. "Bayesian indicator variable selection to incorporate hierarchical overlapping group structure in multi-omics applications." Ann. Appl. Stat. 13 (4) 2611 - 2636, December 2019. https://doi.org/10.1214/19-AOAS1271

Information

Received: 1 November 2018; Revised: 1 May 2019; Published: December 2019
First available in Project Euclid: 28 November 2019

zbMATH: 07160952
MathSciNet: MR4037443
Digital Object Identifier: 10.1214/19-AOAS1271

Keywords: Bayesian variable selection , hierarchical overlapping group structure , overlapping groups , spike and slab

Rights: Copyright © 2019 Institute of Mathematical Statistics

Vol.13 • No. 4 • December 2019
Back to Top