Annals of Applied Statistics

Multiview cluster aggregation and splitting, with an application to multiomic breast cancer data

Antoine Godichon-Baggioni, Cathy Maugis-Rabusseau, and Andrea Rau

Multiview data, which represent distinct but related groupings of variables, can be useful for identifying relevant and robust clustering structures among observations. A large number of multiview classification algorithms have been proposed in the fields of computer science and genomics; here, we instead focus on the task of merging or splitting an existing hard or soft cluster partition based on multiview data. This article is specifically motivated by an application involving multiomic breast cancer data from The Cancer Genome Atlas, where multiple molecular profiles (gene expression, microRNA expression, methylation and copy number alterations) are used to further subdivide the five currently accepted intrinsic tumor subtypes into distinct subgroups of patients. In addition, we investigate the performance of the proposed multiview splitting and aggregation algorithms, as compared to single- and concatenated-view alternatives, in a set of simulations. The multiview splitting and aggregation algorithms developed here are implemented in the maskmeans R package.

Ann. Appl. Stat., Volume 14, Number 2 (2020), 752-767.

Received: November 2018
Revised: November 2019
First available in Project Euclid: 29 June 2020

Clustering multiview cluster merging and splitting multiomic data TCGA


Supplemental materials

  • Multiview cluster aggregation and splitting, with an application to multiomic breast cancer data: Supplementary file. In this Supplementary Material, some additional figures are given as well as proofs of Propositions 3.1 and 3.2.