Bayesian Analysis

The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference

Daniel Hernandez-Stumpfhauser, F. Jay Breidt, and Mark J. van der Woerd

Full-text: Open access

Abstract

The general projected normal distribution is a simple and intuitive model for directional data in any dimension: a multivariate normal random vector divided by its length is the projection of that vector onto the surface of the unit hypersphere. Observed data consist of the projections, but not the lengths. Inference for this model has been restricted to the two-dimensional (circular) case, using Bayesian methods with data augmentation to generate the latent lengths and a Metropolis-within-Gibbs algorithm to sample from the posterior. We describe a new parameterization of the general projected normal distribution that makes inference in any dimension tractable, including the important three-dimensional (spherical) case, which has not previously been considered. Under this new parameterization, the full conditionals of the unknown parameters have closed forms, and we propose a new slice sampler to draw the latent lengths without the need for rejection. Gibbs sampling with this new scheme is fast and easy, leading to improved Bayesian inference; for example, it is now feasible to conduct model selection among complex mixture and regression models for large data sets. Our parameterization also allows straightforward incorporation of covariates into the covariance matrix of the multivariate normal, increasing the ability of the model to explain directional data as a function of independent regressors. Circular and spherical cases are considered in detail and illustrated with scientific applications. For the circular case, seasonal variation in time-of-day departures of anglers from recreational fishing sites is modeled using covariates in both the mean vector and covariance matrix. For the spherical case, we consider paired angles that describe the relative positions of carbon atoms along the backbone chain of a protein. We fit mixtures of general projected normals to these data, with the best-fitting mixture accurately describing biologically meaningful structures including helices, β-sheets, and coils and turns. Finally, we show via simulation that our methodology has satisfactory performance in some 10-dimensional and 50-dimensional problems.

Article information

Source
Bayesian Anal., Volume 12, Number 1 (2017), 113-133.

Dates
First available in Project Euclid: 19 January 2016

Permanent link to this document
https://projecteuclid.org/euclid.ba/1453211962

Digital Object Identifier
doi:10.1214/15-BA989

Mathematical Reviews number (MathSciNet)
MR3597569

Zentralblatt MATH identifier
1384.62176

Keywords
circular data directional data Gibbs sampler Markov chain Monte Carlo protein structure analysis spherical data

Citation

Hernandez-Stumpfhauser, Daniel; Breidt, F. Jay; van der Woerd, Mark J. The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference. Bayesian Anal. 12 (2017), no. 1, 113--133. doi:10.1214/15-BA989. https://projecteuclid.org/euclid.ba/1453211962


Export citation

References

  • Banerjee, A., Dhillon, I. S., Ghosh, J., and Sra, S. (2005). “Clustering on the unit hypersphere using von Mises-Fisher distributions.” Journal of Machine Learning Research, 6: 1345–1382.
  • Batschelet, E. (1981). Circular Statistics in Biology. Academic Press.
  • Breckling, J. (1989). The Analysis of Directional Time Series: Applications to Wind Speed and Direction. Springer-Verlag.
  • Celeux, G., Forbes, F., Robert, F., and Titterington, D. (2006). “Deviance information criteria for missing data models.” Bayesian Analysis, 1: 651–706.
  • Chang, T. (1993). “Spherical regression and the statistics of tectonic plate reconstructions.” International Statistical Review, 61: 299–316.
  • DeWitte, R. and Shakhnovich, E. (1994). “Pseudodihedrals: Simplified protein backbone representation with knowledge-based energy.” Protein Science, 3: 1570–1581.
  • Dym, O., Mevarech, M., and Sussman, J. L. (1995). “Structural features that stabilize halophilic malate dehydrogenase from an archaebacterium.” Science, 267(5202): 1344–1346.
  • Ferreira, J. T., Juárez, M. A., and Steel, M. F. (2008). “Directional log-spline distributions.” Bayesian Analysis, 3(2): 297–316.
  • Fisher, N. I. (1995). Statistical Analysis of Circular Data. Cambridge University Press.
  • Gao, F., Chia, K.-S., and Machin, D. (2007). “On the evidence for seasonal variation in the onset of acute lymphoblastic leukemia (ALL).” Leukemia Research, 31: 1327–1338.
  • Ghosh, K., Jammalamadaka, R., and Tiwari, R. (2003). “Semiparametric Bayesian techniques for problems in circular data.” Journal of Applied Statistics, 30(2): 145–161.
  • Gneiting, T. and Raftery, A. (2007). “Strictly proper scoring rules.” Journal of the American Statistical Association, 102: 359–378.
  • Hamelryck, T., Kent, J., and Krogh, A. (2006). “Sampling realistic protein conformations using local structural bias.” PLoS Computational Biology, 2: e131.
  • Hernandez-Stumpfhauser, D. (2012). “Topics in design-based and Bayesian inference for surveys.” Ph.D. thesis, Colorado State University.
  • Hernandez-Stumpfhauser, D., Breidt, F. J., and van der Woerd, M. J. (2016). “Supplementary Material of The General Projected Normal Distribution of Arbitrary Dimension: Modeling and Bayesian Inference” Bayesian Analysis.
  • Humphrey, W., Dalke, A., and Schulten, K. (1996). “VMD – Visual Molecular Dynamics.” Journal of Molecular Graphics, 14: 33–38.
  • Jammalamadaka, S. R. and Sengupta, A. (2001). Topics in Circular Statistics, volume 5. World Scientific.
  • Kendall, D. G. (1974). “Pole-seeking Brownian motion and bird navigation.” Journal of the Royal Statistical Society, 36: 261–294.
  • Levitt, M. (1976). “A simplified representation of protein conformation for rapid simulation of protein folding.” Journal of Molecular Biology, 104: 59–107.
  • Mardia, K. and Edwards, R. (1982). “Weighted distributions and rotating caps.” Biometrika, 69: 323–330.
  • Mardia, K. V. (1972). Statistics of Directional Data. Academic Press.
  • Mardia, K. V. and Jupp, P. E. (2000). Directional Statistics. Chichester, UK: Wiley.
  • McVinish, R. and Mengersen, K. (2008). “Semiparametric Bayesian circular statistics.” Computational Statistics & Data Analysis, 52(10): 4722–4730.
  • Neal, R. M. (2003). “Slice sampling.” The Annals of Statistics, 31: 705–741.
  • Nuñez-Antonio, G., Ausin, M., and Wiper, M. (2015). “Nonparametric models of circular variables based on Dirichlet process mixtures of normal distributions.” Journal of Agricultural, Biological, and Environmental Statistics, 20: 47–64.
  • Nuñez-Antonio, G. and Gutiérrez-Peña, E. (2005). “A Bayesian analysis of directional data using the projected normal distribution.” Journal of Applied Statistics, 32(10): 995–1001.
  • Nuñez-Antonio, G., Gutiérrez-Peña, E., and Escalera, G. (2011). “A Bayesian regression model for circular data based on the projected normal distribution.” Statistical Modeling, 11: 185–201.
  • Oldfield, T. and Hubbard, R. (1994). “Analysis of C$_{\alpha}$ geometry in protein structures.” Proteins, 18: 324–337.
  • Oliveira, M., Crujeiras, R. M., and Rodríguez-Casal, A. (2012). “A plug-in rule for bandwidth selection in circular density estimation.” Computational Statistics & Data Analysis, 56(12): 3898–3908.
  • Peel, D., Whiten, W. J., and McLachlan, G. J. (2001). “Fitting mixtures of Kent distributions to aid in joint set identification.” Journal of the American Statistical Association, 96: 56–63.
  • Pewsey, A., Neuhäuser, M., and Ruxton, G. D. (2013). Circular Statistics in R. Oxford University Press.
  • Pourahmadi, M. (1999). “Joint mean-covariance models with applications to longitudinal data: Unconstrained parameterisation.” Biometrika, 86: 677–690.
  • Presnell, B., Morrison, S. P., and Littell, R. C. (1998). “Projected multivariate linear models for directional data.” Journal of the American Statistical Association, 93(443): 1068–1077.
  • Pukkila, T. and Rao, C. (1988). “Pattern recognition based on scale invariant functions.” Information Sciences, 45: 379–389.
  • Ramachandran, G., Ramakrishnan, C., and Sasisekharan, V. (1963). “Stereochemistry of polypeptide chain configurations.” Journal of Molecular Biology, 7: 95–99.
  • Sarwar, B., Karypis, G., Konstan, J., and Riedl, J. (2001). “Item-based collaborative filtering recommendation algorithms.” In: Proceedings of the 10th International Conference on World Wide Web, 285–295. ACM.
  • Schmidt-Koenig, K. (1965). “Current problems in bird orientation.” In: Lehrman, D. (ed.), Advances in the Study of Behaviour, Volume I, 217–278. Academic Press.
  • Spiegelhalter, D. J., Best, N., Carlin, B. P., and van der Linde, A. (2002). “Bayesian measures of model complexity and fit.” Journal of the Royal Statistical Society, Series B, 64: 583–639.
  • Sullivan, P. J., Breidt, F. J., Ditton, R. B., Knuth, B. A., Leaman, B. M., O’Connell, V. M., Parsons, G. R., Pollock, K. H., Smith, S. J., and Stokes, S. L. (2006). Review of Recreational Fisheries Survey Methods. Washington, DC: National Academies Press.
  • Wang, F. and Gelfand, A. E. (2013). “Directional data analysis under the general projected normal distribution.” Statistical Methodology, 10: 113–127.
  • Wang, F. and Gelfand, A. E. (2014). “Modeling space and space-time directional data using projected Gaussian processes.” Journal of the American Statistical Association, 109: 1565–1580.
  • Watson, G. S. (1983). Statistics on Spheres. Wiley.

Supplemental materials