Sparse matrix linear models for structured high-throughput data

Jane W. Liang; Śaunak Sen

doi:10.1214/21-AOAS1444

Abstract

Recent technological advancements have led to the rapid generation of high-throughput biological data which can be used to address novel scientific questions in broad areas of research. These data can be thought of as a large matrix with covariates annotating both its rows and columns. Matrix linear models provide a convenient way for modeling such data. In many situations, sparse estimation of these models is desired. We present fast, general methods for fitting sparse matrix linear models to structured high-throughput data. We induce model sparsity using an ${L_{1}}$ penalty and consider the case when the response matrix and the covariate matrices are large. Due to data size, standard methods for estimation of these penalized regression models fail if the problem is converted to the corresponding univariate regression scenario. By leveraging matrix properties in the structure of our model, we develop several fast estimation algorithms (coordinate descent, FISTA and ADMM) and discuss their trade-offs. We evaluate our method’s performance on simulated data, E. coli chemical genetic screening data and two Arabidopsis genetic datasets with multivariate responses. Our algorithms have been implemented in the Julia programming language and are available at https://github.com/senresearch/MatrixLMnet.jl.

Funding Statement

We thank both UCSF and UTHSC for funding and supportive environments for this work. SS was partly supported by NIH Grants GM123489, GM070683, DA044223, AI121144 and ES022841.

Acknowledgments

This work was started when JWL was a summer intern at UCSF and continued when she was a scientific programmer at UTHSC.

We thank Jon Ågren, Thomas E. Juenger and Tracey J. Woodruff for granting permission to use their data for analysis.

Citation

Download Citation

Jane W. Liang. Śaunak Sen. "Sparse matrix linear models for structured high-throughput data." Ann. Appl. Stat. 16 (1) 169 - 192, March 2022. https://doi.org/10.1214/21-AOAS1444

Information

Received: 1 December 2019; Revised: 1 July 2020; Published: March 2022

First available in Project Euclid: 28 March 2022

MathSciNet: MR4400600

zbMATH: 1498.62232

Digital Object Identifier: 10.1214/21-AOAS1444

Keywords: ADMM , FISTA , gradient descent , Julia , Lasso , proximal gradient algorithms

Abstract

Funding Statement

Acknowledgments

Citation

Information

KEYWORDS/PHRASES

PUBLICATION TITLE:

PUBLICATION YEARS