Open Access
December 2020 Feature selection for data integration with mixed multiview data
Yulia Baker, Tiffany M. Tang, Genevera I. Allen
Ann. Appl. Stat. 14(4): 1676-1698 (December 2020). DOI: 10.1214/20-AOAS1389

Abstract

Data integration methods that analyze multiple sources of data simultaneously can often provide more holistic insights than can separate inquiries of each data source. Motivated by the advantages of data integration in the era of “big data,” we investigate feature selection for high-dimensional multiview data with mixed data types (e.g., continuous, binary, count-valued). This heterogeneity of multiview data poses numerous challenges for existing feature selection methods. However, after critically examining these issues through empirical and theoretically-guided lenses, we develop a practical solution, the Block Randomized Adaptive Iterative Lasso (B-RAIL) which combines the strengths of the randomized Lasso, adaptive weighting schemes and stability selection. B-RAIL serves as a versatile data integration method for sparse regression and graph selection, and we demonstrate the effectiveness of B-RAIL through extensive simulations and a case study to infer the ovarian cancer gene regulatory network. In this case study, B-RAIL successfully identifies well-known biomarkers associated with ovarian cancer and hints at novel candidates for future ovarian cancer research.

Citation

Download Citation

Yulia Baker. Tiffany M. Tang. Genevera I. Allen. "Feature selection for data integration with mixed multiview data." Ann. Appl. Stat. 14 (4) 1676 - 1698, December 2020. https://doi.org/10.1214/20-AOAS1389

Information

Received: 1 March 2019; Revised: 1 January 2020; Published: December 2020
First available in Project Euclid: 19 December 2020

MathSciNet: MR4194243
Digital Object Identifier: 10.1214/20-AOAS1389

Keywords: data fusion , integrative genomics , Lasso/GLM Lasso , mixed graphical models , multimodal data , stability selection , Variable selection

Rights: Copyright © 2020 Institute of Mathematical Statistics

Vol.14 • No. 4 • December 2020
Back to Top