Open Access
June 2019 Modeling Population Structure Under Hierarchical Dirichlet Processes
Lloyd T. Elliott, Maria De Iorio, Stefano Favaro, Kaustubh Adhikari, Yee Whye Teh
Bayesian Anal. 14(2): 313-339 (June 2019). DOI: 10.1214/17-BA1093

Abstract

We propose a Bayesian nonparametric model to infer population admixture, extending the hierarchical Dirichlet process to allow for correlation between loci due to linkage disequilibrium. Given multilocus genotype data from a sample of individuals, the proposed model allows inferring and classifying individuals as unadmixed or admixed, inferring the number of subpopulations ancestral to an admixed population and the population of origin of chromosomal regions. Our model does not assume any specific mutation process, and can be applied to most of the commonly used genetic markers. We present a Markov chain Monte Carlo (MCMC) algorithm to perform posterior inference from the model and we discuss some methods to summarize the MCMC output for the analysis of population admixture. Finally, we demonstrate the performance of the proposed model in a real application, using genetic data from the ectodysplasin-A receptor (EDAR) gene, which is considered to be ancestry-informative due to well-known variations in allele frequency as well as phenotypic effects across ancestry. The structure analysis of this dataset leads to the identification of a rare haplotype in Europeans. We also conduct a simulated experiment and show that our algorithm outperforms parametric methods.

Citation

Download Citation

Lloyd T. Elliott. Maria De Iorio. Stefano Favaro. Kaustubh Adhikari. Yee Whye Teh. "Modeling Population Structure Under Hierarchical Dirichlet Processes." Bayesian Anal. 14 (2) 313 - 339, June 2019. https://doi.org/10.1214/17-BA1093

Information

Published: June 2019
First available in Project Euclid: 19 May 2018

zbMATH: 07045433
MathSciNet: MR3934088
Digital Object Identifier: 10.1214/17-BA1093

Keywords: admixture modeling , Bayesian nonparametrics , Hierarchical Dirichlet process , linkage disequilibrium , MCMC algorithm , population stratification , single nucleotide polymorphism data

Vol.14 • No. 2 • June 2019
Back to Top