Annals of Applied Statistics

Multiview cluster aggregation and splitting, with an application to multiomic breast cancer data

Antoine Godichon-Baggioni, Cathy Maugis-Rabusseau, and Andrea Rau

Full-text: Access denied (no subscription detected)

We're sorry, but we are unable to provide you with the full text of this article because we are not able to identify you as a subscriber. If you have a personal subscription to this journal, then please login. If you are already logged in, then you may need to update your profile to register your subscription. Read more about accessing full-text

Abstract

Multiview data, which represent distinct but related groupings of variables, can be useful for identifying relevant and robust clustering structures among observations. A large number of multiview classification algorithms have been proposed in the fields of computer science and genomics; here, we instead focus on the task of merging or splitting an existing hard or soft cluster partition based on multiview data. This article is specifically motivated by an application involving multiomic breast cancer data from The Cancer Genome Atlas, where multiple molecular profiles (gene expression, microRNA expression, methylation and copy number alterations) are used to further subdivide the five currently accepted intrinsic tumor subtypes into distinct subgroups of patients. In addition, we investigate the performance of the proposed multiview splitting and aggregation algorithms, as compared to single- and concatenated-view alternatives, in a set of simulations. The multiview splitting and aggregation algorithms developed here are implemented in the maskmeans R package.

Article information

Source
Ann. Appl. Stat., Volume 14, Number 2 (2020), 752-767.

Dates
Received: November 2018
Revised: November 2019
First available in Project Euclid: 29 June 2020

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1593449324

Digital Object Identifier
doi:10.1214/19-AOAS1317

Mathematical Reviews number (MathSciNet)
MR4117828

Zentralblatt MATH identifier
07239882

Keywords
Clustering multiview cluster merging and splitting multiomic data TCGA

Citation

Godichon-Baggioni, Antoine; Maugis-Rabusseau, Cathy; Rau, Andrea. Multiview cluster aggregation and splitting, with an application to multiomic breast cancer data. Ann. Appl. Stat. 14 (2020), no. 2, 752--767. doi:10.1214/19-AOAS1317. https://projecteuclid.org/euclid.aoas/1593449324


Export citation

References

  • Acar, E. and Yener, B. (2008). Unsupervised multiway data analysis: A literature survey. IEEE Trans. Knowl. Data Eng. 21 6–20.
  • Baudry, J.-P., Maugis, C. and Michel, B. (2012). Slope heuristics: Overview and implementation. Stat. Comput. 22 455–470.
  • Baudry, J.-P., Raftery, A. E., Celeux, G., Lo, K. and Gottardo, R. (2010). Combining mixture components for clustering. J. Comput. Graph. Statist. 19 332–353.
  • Bezdek, J. C. (1981). Pattern Recognition with Fuzzy Objective Function Algorithms. Plenum Press, New York.
  • Bray, F., Ferlay, J., Soerjomataram, I., Siegel, R. L., Torre, L. A. and Jemal, A. (2018). Global cancer statistics 2018: Globocan estimates of incidence and mortality worldwide for 36 cancers in 185 countries. CA Cancer J. Clin.
  • Cai, X., Nie, F. and Huang, H. (2013). Multi-view k-means clustering on big data. In Twenty-Third International Joint Conference on Artificial Intelligence.
  • Chao, G., Sun, S. and Bi, J. (2017). A survey on multi-view clustering. Preprint. Available at arXiv:1712.06246.
  • Chen, X., Xu, J. Z., Huang, X. and Ye, Y. (2013). TW-$k$-means: Automated two-level variable weighting clustering algorithm for multiview data. IEEE Trans. Knowl. Data Eng. 25.
  • Ciriello, G., Sinha, R., Hoadley, K. A., Jacobsen, A. S., Reva, B., Perou, C. M., Sander, C. and Schultz, N. (2013). The molecular diversity of Luminal A breast tumors. Breast Cancer Research and Treatment 131 409–420.
  • Dueck, D., Morris, Q. D. and Frey, B. J. (2005). Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Bioinformatics 21 i144–i151.
  • Feng, Q., Jiang, M., Hannig, J. and Marron, J. S. (2018). Angle-based joint and individual variation explained. J. Multivariate Anal. 166 241–265.
  • Ferraro, M. B. and Giordani, P. (2015). A toolbox for fuzzy clustering using the R programming language. Fuzzy Sets and Systems 279 1–16.
  • Fleischer, T., Klajic, J., Aure, M. R., Louhimo, R., Pladsen, A. V., Ottestad, L., Touleimat, N., Laakso, M., Halvorsen, A. R. et al. (2017). DNA methylation signature (SAM40) identifies subgroups of the Luminal A breast cancer samples with distinct survival. Oncotarget 8 1074–1082.
  • Fratello, M., Caiazzo, G., Trojsi, F., Russo, A., Tedeschi, G., Tagliaferri, R. and Esposito, F. (2017). Multi-view ensemble classification of brain connectivity images for neurodegeneration type discrimination. Neuroinformatics 15 199–213.
  • Gaynanova, I. and Li, G. (2019). Structural learning and integrative decomposition of multi-view data. Biometrics 75 1121–1132.
  • Godichon-Baggioni, A., Maugis-Rabusseau, C. and Rau, A. (2020). Supplement to “Multiview cluster aggregation and splitting, with an application to multiomic breast cancer data.” https://doi.org/10.1214/19-AOAS1317SUPP.
  • Gu, Z., Eils, R. and Schlesner, M. (2016). Complex heatmaps reveal patterns and correlations in multidimensional genomic data. Bioinformatics 32 2847–2849.
  • Hamid, J. S., Hi, P., Roslin, N. M., Ling, V., Greenwood, C. M. T. and Beyene, J. (2009). Data integration in genetics and genomics: Methods and challenges. Human Genomics Proteomics 869093.
  • Hubert, L. and Arabie, P. (1985). Comparing partitions. J. Classification 193–218.
  • Kailing, K., Kriegel, H.-P., Pryakhin, A. and Schubert, M. (2004). Clustering multi-represented objects with noise. In Proceedings of the 8th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD-04) 394–403.
  • Kanehisa, M., Sato, Y., Kawashima, M., Furumichi, M. and Tanabe, M. (2016). KEGG as a reference resource for gene and protein annotation. Nucleic Acids Res. 44 D457–D462.
  • Koutsonikola, V. A. and Vakali, A. I. (2009). A fuzzy bi-clustering approach to correlate web users and pages. International Journal of Knowledge and Web Intelligence 1 3–23.
  • Kumar, A. and Daumé, H. (2011). A co-training approach for multi-view spectral clustering. In Proceedings of the 28th International Conference on Machine Learning (ICML-11) 393–400.
  • Kumar, A., Rai, P. and Daume, H. (2011). Co-regularized multi-view spectral clustering. In Advances in Neural Information Processing Systems 1413–1421.
  • Liu, X., Ji, S., Glänzel, W. and De Moor, B. (2012). Multiview partitioning via tensor methods. IEEE Trans. Knowl. Data Eng. 25 1056–1069.
  • Liu, J., Lichtenberg, T., Hoadley, K. A., Poisson, L. M., Lazar, A. J., Cherniack, A. D., Kovatich, A. J., Benz, C. C., Levine, D. A. et al. (2018). An integrated tcga pan-cancer clinical data resource to drive high-quality survival outcome analytics. Cell 173 400–416.
  • Lock, E. F., Hoadley, K. A., Marron, J. S. and Nobel, A. B. (2013). Joint and individual variation explained (JIVE) for integrated analysis of multiple data types. Ann. Appl. Stat. 7 523–542.
  • Mo, Q., Wang, S., Seshan, V. E., Olshen, A. B., Schultz, N., Sander, C., Powers, R. S., Ladanyi, M. and Shen, R. (2013). Pattern discovery and cancer gene identification in integrated cancer genomic data. Proc. Natl. Acad. Sci. USA 110 4245–4250.
  • Paquet, E. R. and Hallett, M. T. (2000). Absolute assignment of breast cancer intrinsic molecular subtype. J. Natl. Cancer Inst. 107.
  • Parker, J. S., Mullins, M., Cheang, M. C. U., Leung, S., Voduc, D., Vickery, T., Davies, S., Fauron, C., He, X. et al. (2009). Supervised risk predictor of breast cancer based on intrinsic subtypes. J. Clin. Oncol. 27 1160–1167.
  • Pensa, R. G., Robardet, C. and Boulicaut, J.-F. (2005). A bi-clustering framework for categorical data. In European Conference on Principles of Data Mining and Knowledge Discovery 643–650. Springer, Berlin.
  • Perou, C. M., Sørlie, T., Eisen, M. B., van de Rijn, M., Jeffrey, S. S., Rees, C. A., Pollack, J. R., Ross, D. T., Johnsen, H. et al. (2000). Molecular portraits of human breast tumours. Nature 406 747–752.
  • Rappoport, N. and Shamir, R. (2018). Multi-omic and multi-view clustering algorithms: Review and cancer benchmark. Nucleic Acids Res. gky889.
  • Rau, A., Flister, M., Rui, H. and Auer, P. L. (2018). Exploring drivers of gene expression in the Cancer Genome Atlas. Bioinformatics bty551.
  • Serra, A., Fratello, M., Fortino, V., Raiconi, G., Tagliaferri, R. and Greco, D. (2015). Mvda: A multi-view genomic data integration methodology. BMC Bioinform. 16 261.
  • Shen, R., Olshen, A. B. and Ladanyi, M. (2009). Integrative clustering of multiple genomic data types using a joint latent variable model with application to breast and lung cancer subtype analysis. Bioinformatics 25 2906–2912.
  • Shen, R., Mo, Q., Schultz, N., Seshan, V. E., Olshen, A. B., Huse, J., Ladanyi, M. and Sander, C. (2012). Integrative subtype discovery in glioblastoma using icluster. PLoS ONE 7 e35236.
  • Taskesen, E., Huisman, S. M. H., Mahfouz, A., Krijthe, J. H., de Ridder, J., van de Stolpe, A., van den Akker, E., Verheagh, W. and Reinders, M. J. T. (2016). Pan-cancer subtyping in a 2D-map shows substructures that are driven by specific combinations of molecular characteristics. Sci. Rep. 6.
  • The Cancer Genome Atlas Network (2012). Comprehensive molecular portraits of human breast tumours. Nature 490 61–70.
  • The Cancer Genome Atlas Network, Weinstein, J. N., Collisson, E. A., Mills, G. B., Mills Shaw, K. R., Ozenberger, B. A., Ellrott, K., Shmulevich, I., Sander, C. et al. (2013). The Cancer Genome Atlas Pan-Cancer analysis project. Nat. Genet. 45 1113–1120.
  • Wang, Y. and Chen, L. (2017). Multi-view fuzzy clustering with minimax optimization for effective clustering of data from multiple sources. Expert Syst. Appl. 72 457–466.
  • Wang, B., Mezlini, A. M., Demir, F., Fiume, M., Tu, Z., Brudno, M., Haibe-Kains, B. and Goldenberg, A. (2014). Similarity network fusion for aggregating data types on a genomic scale. Nat. Methods 11 333–337.
  • Wickham, H. (2016). Ggplot2: Elegant Graphics for Data Analysis. Springer, New York.
  • Xu, C., Tao, D. and Xu, C. (2013). A survey on multi-view learning. Preprint. Available at arXiv:1304.5634.
  • Yang, Z. and Michailidis, G. (2016). A non-negative matrix factorization method for detecting modules in heterogeneous omics multi-modal data. Bioinformatics 32 1–8.
  • Yang, Y. and Wang, H. (2018). Multi-view clustering: A survey. Big Data Mining and Analytics 1 83–107.
  • Yu, Q., Risk, B. B., Zhang, K. and Marron, J. S. (2017). Jive integration of imaging and behavioral data. NeuroImage 152 38–49.
  • Zappia, L. and Oshlack, A. (2018). Clustering trees: A visualisation for evaluating clusterings at multiple resolutions. GigaScience 7.

Supplemental materials

  • Multiview cluster aggregation and splitting, with an application to multiomic breast cancer data: Supplementary file. In this Supplementary Material, some additional figures are given as well as proofs of Propositions 3.1 and 3.2.