December 2021 Clustering on the torus by conformal prediction
Sungkyu Jung, Kiho Park, Byungwon Kim
Author Affiliations +
Ann. Appl. Stat. 15(4): 1583-1603 (December 2021). DOI: 10.1214/21-AOAS1459

Abstract

Motivated by the analysis of torsion (dihedral) angles in the backbone of proteins, we investigate clustering of bivariate angular data on the torus [π,π)×[π,π). We show that naive adaptations of clustering methods, designed for vector-valued data, to the torus are not satisfactory and propose a novel clustering approach based on the conformal prediction framework. We construct several prediction sets for toroidal data with guaranteed finite-sample validity, based on a kernel density estimate and bivariate von Mises mixture models. From a prediction set built from a Gaussian approximation of the bivariate von Mises mixture, we propose a data-driven choice for the number of clusters and present algorithms for an automated cluster identification and cluster membership assignment. The proposed prediction sets and clustering approaches are applied to the torsion angles extracted from three strains of coronavirus spike glycoproteins (including SARS-CoV-2, contagious in humans). The analysis reveals a potential difference in the clusters of the SARS-CoV-2 torsion angles, compared to the clusters found in torsion angles from two different strains of coronavirus, contagious in animals.

Funding Statement

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. 2019R1A2C2002256).

Acknowledgments

We thank the Editor, the Associate Editor and two anonymous referees for their constructive comments which helped us to substantially improve our manuscript.

Citation

Download Citation

Sungkyu Jung. Kiho Park. Byungwon Kim. "Clustering on the torus by conformal prediction." Ann. Appl. Stat. 15 (4) 1583 - 1603, December 2021. https://doi.org/10.1214/21-AOAS1459

Information

Received: 1 September 2020; Revised: 1 January 2021; Published: December 2021
First available in Project Euclid: 21 December 2021

MathSciNet: MR4355067
zbMATH: 1498.62229
Digital Object Identifier: 10.1214/21-AOAS1459

Keywords: Density estimation , directional statistics , prediction set , protein structure , torsion angles , von Mises distribution

Rights: Copyright © 2021 Institute of Mathematical Statistics

JOURNAL ARTICLE
21 PAGES

This article is only available to subscribers.
It is not available for individual sale.
+ SAVE TO MY LIBRARY

Vol.15 • No. 4 • December 2021
Back to Top