The Annals of Applied Statistics

Torus principal component analysis with applications to RNA structure

Benjamin Eltzner, Stephan Huckemann, and Kanti V. Mardia

There are several cutting edge applications needing PCA methods for data on tori, and we propose a novel torus-PCA method that adaptively favors low-dimensional representations while preventing overfitting by a new test—both of which can be generally applied and address shortcomings in two previously proposed PCA methods. Unlike tangent space PCA, our torus-PCA features structure fidelity by honoring the cyclic topology of the data space and, unlike geodesic PCA, produces nonwinding, nondense descriptors. These features are achieved by deforming tori into spheres with self-gluing and then using a variant of the recently developed principal nested spheres analysis. This PCA analysis involves a step of subsphere fitting, and we provide a new test to avoid overfitting. We validate our torus-PCA by application to an RNA benchmark data set. Further, using a larger RNA data set, torus-PCA recovers previously found structure, now globally at the one-dimensional representation, which is not accessible via tangent space PCA.

Article information

Ann. Appl. Stat., Volume 12, Number 2 (2018), 1332-1359.

Received: March 2017
Revised: July 2017
First available in Project Euclid: 28 July 2018

Statistics on manifolds tori deformation directional statistics dimension reduction dihedral angles fitting small spheres principal nested spheres analysis


Eltzner, Benjamin; Huckemann, Stephan; Mardia, Kanti V. Torus principal component analysis with applications to RNA structure. Ann. Appl. Stat. 12 (2018), no. 2, 1332--1359. doi:10.1214/17-AOAS1115.

Supplemental materials

  • Supplement A: Data. An illustration how to choose data-driven parameters for torus PCA.
  • Supplement B: Data. RNA residue data used for the analysis in this paper.
  • Supplement C: Implementation. Source code of the T-PCA implementation used for this paper.