Open Access
February 2018 Estimating Transmission from Genetic and Epidemiological Data: A Metric to Compare Transmission Trees
Michelle Kendall, Diepreye Ayabina, Yuanwei Xu, James Stimson, Caroline Colijn
Statist. Sci. 33(1): 70-85 (February 2018). DOI: 10.1214/17-STS637


Reconstructing who infected whom is a central challenge in analysing epidemiological data. Recently, advances in sequencing technology have led to increasing interest in Bayesian approaches to inferring who infected whom using genetic data from pathogens. The logic behind such approaches is that isolates that are nearly genetically identical are more likely to have been recently transmitted than those that are very different. A number of methods have been developed to perform this inference. However, testing their convergence, examining posterior sets of transmission trees and comparing methods’ performance are challenged by the fact that the object of inference—the transmission tree—is a complicated discrete structure. We introduce a metric on transmission trees to quantify distances between them. The metric can accommodate trees with unsampled individuals, and highlights differences in the source case and in the number of infections per infector. We illustrate its performance on simple simulated scenarios and on posterior transmission trees from a TB outbreak. We find that the metric reveals where the posterior is sensitive to the priors, and where collections of trees are composed of distinct clusters. We use the metric to define median trees summarising these clusters. Quantitative tools to compare transmission trees to each other will be required for assessing MCMC convergence, exploring posterior trees and benchmarking diverse methods as this field continues to mature.


Download Citation

Michelle Kendall. Diepreye Ayabina. Yuanwei Xu. James Stimson. Caroline Colijn. "Estimating Transmission from Genetic and Epidemiological Data: A Metric to Compare Transmission Trees." Statist. Sci. 33 (1) 70 - 85, February 2018.


Published: February 2018
First available in Project Euclid: 2 February 2018

zbMATH: 07031391
MathSciNet: MR3757505
Digital Object Identifier: 10.1214/17-STS637

Keywords: Bayesian inference , epidemiology , genomics , infectious diseases , modelling

Rights: Copyright © 2018 Institute of Mathematical Statistics

Vol.33 • No. 1 • February 2018
Back to Top