Open Access
March 2021 Spike-and-slab Lasso biclustering
Gemma E. Moran, Veronika Ročková, Edward I. George
Author Affiliations +
Ann. Appl. Stat. 15(1): 148-173 (March 2021). DOI: 10.1214/20-AOAS1385

Abstract

Biclustering methods simultaneously group samples and their associated features. In this way, biclustering methods differ from traditional clustering methods, which utilize the entire set of features to distinguish groups of samples. Motivating applications for biclustering include genomics data, where the goal is to cluster patients or samples by their gene expression profiles; and recommender systems, which seek to group customers based on their product preferences. Biclusters of interest often manifest as rank-1 submatrices of the data matrix. This submatrix detection problem can be viewed as a factor analysis problem in which both the factors and loadings are sparse. In this paper, we propose a new biclustering method called Spike-and-Slab Lasso Biclustering (SSLB) which utilizes the Spike-and-Slab Lasso of Ročková and George (J. Amer. Statist. Assoc. 113 (2018) 431–444) to find such a sparse factorization of the data matrix. SSLB also incorporates an Indian Buffet Process prior to automatically choose the number of biclusters. Many biclustering methods make assumptions about the size of the latent biclusters; either assuming that the biclusters are all of the same size, or that the biclusters are very large or very small. In contrast, SSLB can adapt to find biclusters which have a continuum of sizes. SSLB is implemented via a fast EM algorithm with a variational step. In a variety of simulation settings, SSLB outperforms other biclustering methods. We apply SSLB to both a microarray dataset and a single-cell RNA-sequencing dataset and highlight that SSLB can recover biologically meaningful structures in the data. The SSLB software is available as an R/C++ package at https://github.com/gemoran/SSLB.

Acknowledgments

This research was supported by NSF Grants DMS-1916245, DMS-1944740 and the James S. Kemper Research Fund at the Booth School of Business. We would like to thank the Editor, Associate Editor and anonymous referees for helpful suggestions which improved this paper.

Citation

Download Citation

Gemma E. Moran. Veronika Ročková. Edward I. George. "Spike-and-slab Lasso biclustering." Ann. Appl. Stat. 15 (1) 148 - 173, March 2021. https://doi.org/10.1214/20-AOAS1385

Information

Received: 1 August 2019; Revised: 1 August 2020; Published: March 2021
First available in Project Euclid: 18 March 2021

Digital Object Identifier: 10.1214/20-AOAS1385

Keywords: Bayes , Biclustering , factor analysis , hierarchical modeling , Spike-and-Slab Lasso , Variable selection

Rights: Copyright © 2021 Institute of Mathematical Statistics

Vol.15 • No. 1 • March 2021
Back to Top