Open Access
2020 Profile likelihood biclustering
Cheryl Flynn, Patrick Perry
Electron. J. Statist. 14(1): 731-768 (2020). DOI: 10.1214/19-EJS1667

Abstract

Biclustering, the process of simultaneously clustering the rows and columns of a data matrix, is a popular and effective tool for finding structure in a high-dimensional dataset. Many biclustering procedures appear to work well in practice, but most do not have associated consistency guarantees. To address this shortcoming, we propose a new biclustering procedure based on profile likelihood. The procedure applies to a broad range of data modalities, including binary, count, and continuous observations. We prove that the procedure recovers the true row and column classes when the dimensions of the data matrix tend to infinity, even if the functional form of the data distribution is misspecified. The procedure requires computing a combinatorial search, which can be expensive in practice. Rather than performing this search directly, we propose a new heuristic optimization procedure based on the Kernighan-Lin heuristic, which has nice computational properties and performs well in simulations. We demonstrate our procedure with applications to congressional voting records, and microarray analysis.

Citation

Download Citation

Cheryl Flynn. Patrick Perry. "Profile likelihood biclustering." Electron. J. Statist. 14 (1) 731 - 768, 2020. https://doi.org/10.1214/19-EJS1667

Information

Received: 1 July 2018; Published: 2020
First available in Project Euclid: 31 January 2020

zbMATH: 07163272
MathSciNet: MR4058380
Digital Object Identifier: 10.1214/19-EJS1667

Subjects:
Primary: 62-07
Secondary: 62G20

Keywords: Biclustering , block model , congressional voting , microarray data , profile likelihood

Vol.14 • No. 1 • 2020
Back to Top