The Annals of Applied Statistics

A method for visual identification of small sample subgroups and potential biomarkers

Charlotte Soneson and Magnus Fontes

Full-text: Open access


In order to find previously unknown subgroups in biomedical data and generate testable hypotheses, visually guided exploratory analysis can be of tremendous importance. In this paper we propose a new dissimilarity measure that can be used within the Multidimensional Scaling framework to obtain a joint low-dimensional representation of both the samples and variables of a multivariate data set, thereby providing an alternative to conventional biplots. In comparison with biplots, the representations obtained by our approach are particularly useful for exploratory analysis of data sets where there are small groups of variables sharing unusually high or low values for a small group of samples.

Article information

Ann. Appl. Stat., Volume 5, Number 3 (2011), 2131-2149.

First available in Project Euclid: 13 October 2011

Principal Components Analysis biplot dimension reduction multidimensional scaling visualization


Soneson, Charlotte; Fontes, Magnus. A method for visual identification of small sample subgroups and potential biomarkers. Ann. Appl. Stat. 5 (2011), no. 3, 2131--2149. doi:10.1214/11-AOAS460.

Supplemental materials

  • Supplementary material A: Supplementary material. In the supplementary material we give a small schematic example showing the different steps of CUMBIA. Further, we show how to emphasize both over- and underexpressed variables in the visualization and how the choice of K and s affect the resulting visualization. We also provide scree plots obtained by CUMBIA and PCA for the three data sets studied in the paper.
  • Supplementary material B: Supplementary figures—Projection pursuit results. The supplementary figures show the result of the FastICA projection pursuit algorithm applied to the three data sets considered in the paper. Note that to facilitate the interpretation of the figures, the axes are ungraded and only the origin is marked.