The Annals of Applied Statistics
- Ann. Appl. Stat.
- Volume 12, Number 2 (2018), 1332-1359.
Torus principal component analysis with applications to RNA structure
There are several cutting edge applications needing PCA methods for data on tori, and we propose a novel torus-PCA method that adaptively favors low-dimensional representations while preventing overfitting by a new test—both of which can be generally applied and address shortcomings in two previously proposed PCA methods. Unlike tangent space PCA, our torus-PCA features structure fidelity by honoring the cyclic topology of the data space and, unlike geodesic PCA, produces nonwinding, nondense descriptors. These features are achieved by deforming tori into spheres with self-gluing and then using a variant of the recently developed principal nested spheres analysis. This PCA analysis involves a step of subsphere fitting, and we provide a new test to avoid overfitting. We validate our torus-PCA by application to an RNA benchmark data set. Further, using a larger RNA data set, torus-PCA recovers previously found structure, now globally at the one-dimensional representation, which is not accessible via tangent space PCA.
Ann. Appl. Stat., Volume 12, Number 2 (2018), 1332-1359.
Received: March 2017
Revised: July 2017
First available in Project Euclid: 28 July 2018
Permanent link to this document
Digital Object Identifier
Mathematical Reviews number (MathSciNet)
Eltzner, Benjamin; Huckemann, Stephan; Mardia, Kanti V. Torus principal component analysis with applications to RNA structure. Ann. Appl. Stat. 12 (2018), no. 2, 1332--1359. doi:10.1214/17-AOAS1115. https://projecteuclid.org/euclid.aoas/1532743497
- Supplement A: Data. An illustration how to choose data-driven parameters for torus PCA.
- Supplement B: Data. RNA residue data used for the analysis in this paper.
- Supplement C: Implementation. Source code of the T-PCA implementation used for this paper.