Electronic Journal of Statistics

On geometric probability distributions on the torus with applications to molecular biology

Alessandro Selvitella

Full-text: Open access

Abstract

In this paper, we study a family of probability distributions, alternative to the von Mises family, called Inverse Stereographic Normal Distributions. These distributions are counterparts of the Gaussian Distribution on $\mathbb{S}^{1}$ (univariate) and $\mathbb{T}^{n}$ (multivariate). We discuss some key properties of the models, such as unimodality and closure with respect to marginalizing and conditioning. We compare this family of distributions to the von Mises’ family and the Wrapped Normal Distribution. Then, we discuss some inferential problems, introduce a notion of moments which is natural for inverse stereographic distributions and revisit a version of the CLT in this context. We construct point estimators, confidence intervals and hypothesis tests and discuss briefly sampling methods. Finally, we conclude with some applications to molecular biology and some illustrative examples. This study is motivated by the Protein Folding Problem and by the fact that a large number of proteins involved in the DNA-metabolism assume a toroidal shape with some amorphous regions.

Article information

Source
Electron. J. Statist., Volume 13, Number 2 (2019), 2717-2763.

Dates
Received: June 2017
First available in Project Euclid: 21 August 2019

Permanent link to this document
https://projecteuclid.org/euclid.ejs/1566353061

Digital Object Identifier
doi:10.1214/19-EJS1579

Subjects
Primary: 62E15: Exact distribution theory 62H10: Distribution of statistics 62H11: Directional data; spatial statistics
Secondary: 62E10: Characterization and structure theory 62E20: Asymptotic distribution theory 62F03: Hypothesis testing 62F10: Point estimation 62F12: Asymptotic properties of estimators 62H15: Hypothesis testing

Keywords
Distributions on manifolds von Mises distributions inverse stereographic distributions molecular biology protein folding problem DNA-metabolism

Rights
Creative Commons Attribution 4.0 International License.

Citation

Selvitella, Alessandro. On geometric probability distributions on the torus with applications to molecular biology. Electron. J. Statist. 13 (2019), no. 2, 2717--2763. doi:10.1214/19-EJS1579. https://projecteuclid.org/euclid.ejs/1566353061


Export citation

References

  • [1] T. Abe, K. Shimizu and A. Pewsey, Symmetric unimodal models for directional data motivated by inverse stereographic projection, Journal of the Japan Statistical Society, 40 (1) (2010) 045-061.
  • [2] J. Besag, Statistical analysis of non-lattice data, Journal of the Royal Statistical Society. Series D (The Statistician), Vol. 24 No. 3 (1975) 179-195.
  • [3] D. J. Best and N. I. Fisher, Efficient simulation methods of the Von Mises Distribution, Journal of the Royal Statistical Society. Series C (Appl. Statist.), 28 No. 2 (1979) 152-157.
  • [4] G. Casella and R. L. Berger, Statistical Inference, The Wadsworth & Brooks/Cole Statistics/Probability Series, Pacific Grove (1990).
  • [5], http://cluster.physics.iisc.ernet.in/cgibin/cadb/callapplet1.pl?identity=25&experiment=N&mode=F&Entire=1&Fully=0&Additional=0&Generous=0&Disallowed=0&Alpha=0&Beta=0&Threeten=0&residue0=ALA
  • [6] D. B. Dix, Mathematical Models of Protein Folding, http://people.math.sc.edu/dix/fold.pdf
  • [7] J. -L. Dortet-Bernadet and N. Wicker, A Note on Inverse Stereographic Projection of Elliptical Distributions, Sankhya A, Volume 80 Issue 1 (2018) 138-151.
  • [8] W. Feller, An Introduction to Probability Theory and its Applications, John Wiley and Sons, New York (1970).
  • [9] C. R. Goodall and K. V. Mardia, Projective Shape Analysis, Journal of Computational and Graphical Statistics, 8:2 (1999) 143-168.
  • [10] M. M. Hingorani and M. O’Donnell, Toroidal proteins: Running rings around DNA, Current Biology, Volume 8 Issue 3 (January 1998) R83-R86.
  • [11] M. M. Hingorani and M. O’Donnell, A tale of toroids in DNA metabolism, Nature Reviews Molecular Cell Biology, 1 (October 2000) 22-30.
  • [12] S. Huckemann, K. R. Kim, A. Munk, F. Rehfeldt, M. Sommerfeld, J. Weickeert and C. Wollnik, The circular sizer, inferred persistence of shape parameters and application to early stem cell differentiation, Bernoulli, 22 (4) 2113-2142.
  • [13] J. T. Kent, K. V. Mardia and C. C. Taylor, Modelling strategies for bivariate circular data. In: Barber, S., Baxter, P.D., Gusnanto, A., Mardia, K.V. (eds.), The Art and Science of Statistical Bioinformatics, Leeds University Press, Leeds (2008) 70-74.
  • [14] P. E. Jupp and K. V. Mardia, A general correlation coefficient for directional data and related regression problems, Biometrika, 67 1 (1980) 163-173.
  • [15] P. E. Jupp and K. V. Mardia, Directional Statistics, John Wiley and Sons, New York (2008).
  • [16] P. M. Lee, Bayesian Statistics: An Introduction, 3rd edition, John Wiley and Sons, New York (2004).
  • [17] K. V. Mardia, Statistics of directional data (with discussion), J. Roy. Statist. Soc. B, 37 (1975) 349-393.
  • [18] K. V. Mardia, Characterization of directional distributions. In: Patil, G.P., Kotz, S., Ord, J.K. (eds.), Statistical Distributions in Scientific Work, D. Reidel Publishing Company, Dordrecht (1975) 365-386.
  • [19] K. V. Mardia, Bayesian Analysis for Bivariate Von Mises Distributions, J. Appl. Stat., 37 (2010) 515-528.
  • [20] K. V. Mardia and S. A. M. El-Atoum, Bayesian inference for the von Mises-Fisher distribution, Biometrika, 63 (1976) 203-206.
  • [21] K. V. Mardia and J. Frellsen, Statistics of Bivariate von Mises Distributions, In:, Bayesian Methods in Structural Bioinformatics. Statistics for Biology and Health, Springer, Heidelberg (2012) 159-178.
  • [22] K. V. Mardia, J. T. Kent, G. Hughes and C. C. Taylor, Maximum likelihood estimation using composite likelihoods for closed exponential families, Biometrika, 96 (2009) 975-982.
  • [23] K. V. Mardia, J. T. Kent, G. Hughes, C. C. Taylor and H. Singh, A multivariate von Mises distribution with applications to bioinformatics, Can. J. Stat., 36 (2008) 99-109.
  • [24] K. V. Mardia and V. Patrangenaru, Directions and Projective Shapes, Annals of Statistics, 33 No. 4 (2005) 1666-1699.
  • [25] K. V. Mardia and T. W. Sutton, On the modes of a mixture of two von Mises distributions, Biometrika, 62 3 (1975) 699-701.
  • [26] K. V. Mardia, C. C. Taylor and G. K. Subramaniam, Protein Bioinformatics and Mixtures of Bivariate von Mises Distributions for Angular Data, Biometrics, 63 (2007) 505-512.
  • [27] R. Momen, A. Azizi, L. Wang, Y. Ping, T. Xu, S. R. Kirk, W. Li,S. Manzhos and S. Jenkins, Exploration of the forbidden regions of the Ramachandran plot $(\phi -\psi )$ with QTAIM, Phys. Chem. Chem. Phys., 19 (2017) 26423.
  • [28] R. J. Muirhead, Aspects of multivariate statistical theory, John Wiley and Sons, New York (2005).
  • [29] V. Patrangenaru and L. Ellington, Nonparametric Statistics on Manifolds and Their Applications to Object Data Analysis, CRC/Chapmann Hall, (2015).
  • [30] Y. Phani, S. Girija and A. Dattatreya Rao, Circular model induced by inverse stereographic projection on extreme-value distribution, Engineering Science and Technology, 2 (5) (2012) 881-888.
  • [31] Y. Phani, Y. Srihari, S. Girija and A. Dattatreya Rao, Stereographic circular normal moment distribution, Applied Mathematics and Sciences: An International Journal, 1 (2014) 65-72.
  • [32], Protein Database, http://pdb101.rcsb.org/
  • [33] A. Radhika, S. Girija and A. Dattatreya Rao, On rising Sun von mises and rising Sun wrapped cauchy circular models, Journal of Applied Mathematics, Statistics and Informatics, 9 (2) (2013) 41-55.
  • [34] G. N. Ramachandran, V. Ramakrishnan and C. Sasisekharan, Stereochemistry of polypeptide chain configurations, J. Mol. Biol., 7 (1963) 95-99.
  • [35] L. Rivest, A distribution for dependent unit vectors, Comm. Stat. Theor. Meth., 17 (1988) 461-483.
  • [36] P. J. Rousseuw and K. Van Driessen, A Fast Algorithm for the Minimum Covariance Determinant Estimator, Technometrics, 41 no. 3 (1999) 212-223.
  • [37] D. J. Selkoe, Folding proteins in fatal ways, Nature, 426 (6968) (December 2003) 9004.
  • [38] S. S. Shapiro and M. B. Wilk, Analysis of variance test for normality (complete samples), Biometrika, 52 (1965) 591-611.
  • [39] H. Singh, V. Hnizdo and E. Demchuk, Probabilistic Model for two dependent circular variables, Biometrika, 89 no. 3 (2002) 719-723.
  • [40] C. C. Taylor, Directional Data on the torus, with applications to protein structure, Proceedings of the SIS 2009 Statistical Conference on Statistical Methods for the Analysis of Large Data-Sets, (2013) 105-108.
  • [41] G. S. Watson, Statistics on Spheres, Wiley, New York (1983).