Open Access
2016 A nonparametric HMM for genetic imputation and coalescent inference
Lloyd T. Elliott, Yee Whye Teh
Electron. J. Statist. 10(2): 3425-3451 (2016). DOI: 10.1214/16-EJS1197


Genetic sequence data are well described by hidden Markov models (HMMs) in which latent states correspond to clusters of similar mutation patterns. Theory from statistical genetics suggests that these HMMs are nonhomogeneous (their transition probabilities vary along the chromosome) and have large support for self transitions. We develop a new nonparametric model of genetic sequence data, based on the hierarchical Dirichlet process, which supports these self transitions and nonhomogeneity. Our model provides a parameterization of the genetic process that is more parsimonious than other more general nonparametric models which have previously been applied to population genetics. We provide truncation-free MCMC inference for our model using a new auxiliary sampling scheme for Bayesian nonparametric HMMs. In a series of experiments on male X chromosome data from the Thousand Genomes Project and also on data simulated from a population bottleneck we show the benefits of our model over the popular finite model fastPHASE, which can itself be seen as a parametric truncation of our model. We find that the number of HMM states found by our model is correlated with the time to the most recent common ancestor in population bottlenecks. This work demonstrates the flexibility of Bayesian nonparametrics applied to large and complex genetic data.


Download Citation

Lloyd T. Elliott. Yee Whye Teh. "A nonparametric HMM for genetic imputation and coalescent inference." Electron. J. Statist. 10 (2) 3425 - 3451, 2016.


Received: 1 January 2016; Published: 2016
First available in Project Euclid: 16 November 2016

zbMATH: 1357.62314
MathSciNet: MR3572855
Digital Object Identifier: 10.1214/16-EJS1197

Primary: 62F15
Secondary: 92D10

Keywords: Bayesian nonparametrics , genetic imputation , haplotype inference , HMMs , Population genetics , statistical genetics , TMRCA inference

Rights: Copyright © 2016 The Institute of Mathematical Statistics and the Bernoulli Society

Vol.10 • No. 2 • 2016
Back to Top