## The Annals of Applied Statistics

### Principal nested shape space analysis of molecular dynamics data

#### Abstract

Molecular dynamics simulations produce huge datasets of temporal sequences of molecules. It is of interest to summarize the shape evolution of the molecules in a succinct, low-dimensional representation. However, Euclidean techniques such as principal components analysis (PCA) can be problematic as the data may lie far from in a flat manifold. Principal nested spheres gives a fundamentally different decomposition of data from the usual Euclidean subspace based PCA [Biometrika 99 (2012) 551–568]. Subspaces of successively lower dimension are fitted to the data in a backwards manner with the aim of retaining signal and dispensing with noise at each stage. We adapt the methodology to 3D subshape spaces and provide some practical fitting algorithms. The methodology is applied to cluster analysis of peptides, where different states of the molecules can be identified. Also, the temporal transitions between cluster states are explored.

#### Article information

Source
Ann. Appl. Stat., Volume 13, Number 4 (2019), 2213-2234.

Dates
Revised: March 2019
First available in Project Euclid: 28 November 2019

Permanent link to this document
https://projecteuclid.org/euclid.aoas/1574910042

Digital Object Identifier
doi:10.1214/19-AOAS1277

Mathematical Reviews number (MathSciNet)
MR4037428

#### Citation

Dryden, Ian L.; Kim, Kwang-Rae; Laughton, Charles A.; Le, Huiling. Principal nested shape space analysis of molecular dynamics data. Ann. Appl. Stat. 13 (2019), no. 4, 2213--2234. doi:10.1214/19-AOAS1277. https://projecteuclid.org/euclid.aoas/1574910042

#### References

• Cootes, T. F., Taylor, C. J., Cooper, D. H. and Graham, J. (1994). Image search using flexible shape models generated from sets of examples. In Statistics and Images, Vol. 2 (K. V. Mardia, ed.) 111–139. Carfax, Oxford.
• Damon, J. and Marron, J. S. (2014). Backwards principal component analysis and principal nested relations. J. Math. Imaging Vision 50 107–114.
• Dryden, I. L. (2018). shapes package. R Foundation for Statistical Computing, Vienna, Austria. Contributed package, Version 1.2.4.
• Dryden, I. L. and Mardia, K. V. (2016). Statistical Shape Analysis, with Applications in R, 2nd ed. Wiley, Chichester.
• Fletcher, P. T., Lu, C., Pizer, S. and Joshi, S. (2004). Principal geodesic analysis for the study of nonlinear statistics of shape. IEEE Trans. Med. Imag. 23 995–1005.
• Goodall, C. (1991). Procrustes methods in the statistical analysis of shape. J. Roy. Statist. Soc. Ser. B 53 285–339.
• Gower, J. C. (1975). Generalized Procrustes analysis. Psychometrika 40 33–51.
• Huckemann, S., Hotz, T. and Munk, A. (2010). Intrinsic shape analysis: Geodesic PCA for Riemannian manifolds modulo isometric Lie group actions. Statist. Sinica 20 1–100.
• Huckemann, S. and Ziezold, H. (2006). Principal component analysis for Riemannian manifolds, with an application to triangular shape spaces. Adv. in Appl. Probab. 38 299–319.
• Jolliffe, I. T. (2002). Principal Component Analysis, 2nd ed. Springer Series in Statistics. Springer, New York.
• Jung, S., Dryden, I. L. and Marron, J. S. (2012). Analysis of principal nested spheres. Biometrika 99 551–568.
• Kendall, D. G. (1984). Shape manifolds, Procrustean metrics, and complex projective spaces. Bull. Lond. Math. Soc. 16 81–121.
• Kenobi, K., Dryden, I. L. and Le, H. (2010). Shape curves and geodesic modelling. Biometrika 97 567–584.
• Kent, J. T. (1994). The complex Bingham distribution and shape analysis. J. Roy. Statist. Soc. Ser. B 56 285–299.
• Kent, J. T. and Mardia, K. V. (2001). Shape, Procrustes tangent projections and bilateral symmetry. Biometrika 88 469–485.
• Le, H. L. (1991). On geodesics in Euclidean shape spaces. J. Lond. Math. Soc. (2) 44 360–372.
• Le, H. L. and Kendall, D. G. (1993). The Riemannian structure of Euclidean shape spaces: A novel environment for statistics. Ann. Statist. 21 1225–1271.
• Margulis, C. J., Stern, H. A. and Berne, B. J. (2002). Helix unfolding and intramolecular hydrogen bond dynamics in small $\alpha$-helices in explicit solvent. J. Phys. Chem. B 106 10748–10752.
• Marron, J. S. and Alonso, A. M. (2014). Overview of object oriented data analysis. Biom. J. 56 732–753.
• Panaretos, V. M., Pham, T. and Yao, Z. (2014). Principal flows. J. Amer. Statist. Assoc. 109 424–436.
• Pennec, X. (2018). Barycentric subspace analysis on manifolds. Ann. Statist. 46 2711–2746.
• Salomon-Ferrer, R., Case, D. A. and Walker, R. C. (2013). An overview of the Amber biomolecular simulation package. Wiley Interdisciplinary Reviews: Computational Molecular Science 3 198–210.
• Wang, H. and Marron, J. S. (2007). Object oriented data analysis: Sets of trees. Ann. Statist. 35 1849–1873.
• Ward, J. H. Jr. (1963). Hierarchical grouping to optimize an objective function. J. Amer. Statist. Assoc. 58 236–244.