Bayesian nonparametric analysis of Kingman’s coalescent

Stefano Favaro; Shui Feng; Paul A. Jenkins

doi:10.1214/18-AIHP910

May 2019 Bayesian nonparametric analysis of Kingman’s coalescent

Stefano Favaro, Shui Feng, Paul A. Jenkins

Ann. Inst. H. Poincaré Probab. Statist. 55(2): 1087-1115 (May 2019). DOI: 10.1214/18-AIHP910

Abstract

Kingman’s coalescent is one of the most popular models in population genetics. It describes the genealogy of a population whose genetic composition evolves in time according to the Wright–Fisher model, or suitable approximations of it belonging to the broad class of Fleming–Viot processes. Ancestral inference under Kingman’s coalescent has had much attention in the literature, both in practical data analysis, and from a theoretical and methodological point of view. Given a sample of individuals taken from the population at time $t>0$, most contributions have aimed at making frequentist or Bayesian parametric inference on quantities related to the genealogy of the sample. In this paper we propose a Bayesian nonparametric predictive approach to ancestral inference. That is, under the prior assumption that the composition of the population evolves in time according to a neutral Fleming–Viot process, and given the information contained in an initial sample of $m$ individuals taken from the population at time $t>0$, we estimate quantities related to the genealogy of an additional unobservable sample of size $m^{\prime}\geq1$. As a by-product of our analysis we introduce a class of Bayesian nonparametric estimators (predictors) which can be thought of as Good–Turing type estimators for ancestral inference. The proposed approach is illustrated through an application to genetic data.

La coalescence de Kingman est l’un des modèles les plus populaires en génétique des populations. Il décrit la généalogie d’une population dont la composition génétique évolue dans le temps selon le modèle de Wright–Fisher, ou des approximations appropriées de celle-ci appartenant à la grande classe des processus de Fleming–Viot. L’inférence ancestrale sous la coalescence de Kingman a reçu beaucoup d’attention dans la littérature, à la fois dans l’analyse des données, et d’un point de vue théorique et méthodologique. Étant donné un échantillon d’individus échantillonnés dans la population au temps $t>0$, la plupart des contributions existantes visaient l’inférence paramétrique, fréquentiste ou bayésienne, sur des quantités liées à la généalogie de l’échantillon. Dans cet article, nous proposons une approche prédictive bayésienne non paramétrique de l’inférence ancestrale. C’est-à-dire, sous l’hypothèse préalable que la composition de la population évolue dans le temps selon un processus de Fleming–Viot neutre, et compte tenu de l’information contenue dans un échantillon initial de $m$ individus dans la population au temps $t>0$, nous estimons des quantités liées à la généalogie d’un échantillon additionnel non observable de taille $m^{\prime}\geq1$. En corollaire de notre analyse, nous introduisons une classe d’estimateurs bayésiens non paramétriques (prédicteurs) qui peuvent être considérés comme des estimateurs de type Good–Turing pour l’inférence ancestrale. L’approche proposée est illustrée par une application sur données génétiques.

Citation

Download Citation

Stefano Favaro. Shui Feng. Paul A. Jenkins. "Bayesian nonparametric analysis of Kingman’s coalescent." Ann. Inst. H. Poincaré Probab. Statist. 55 (2) 1087 - 1115, May 2019. https://doi.org/10.1214/18-AIHP910