Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks

File Description SizeFormat 
journal.pcbi.1005495.pdfPublished version3.6 MBAdobe PDFView/Open
Title: Simultaneous inference of phylogenetic and transmission trees in infectious disease outbreaks
Authors: Klinkenberg, D
Backer, JA
Didelot, X
Colijn, C
Wallinga, J
Item Type: Journal Article
Abstract: Whole-genome sequencing of pathogens from host samples becomes more and more routine during infectious disease outbreaks. These data provide information on possible transmission events which can be used for further epidemiologic analyses, such as identification of risk factors for infectivity and transmission. However, the relationship between transmission events and sequence data is obscured by uncertainty arising from four largely unobserved processes: transmission, case observation, within-host pathogen dynamics and mutation. To properly resolve transmission events, these processes need to be taken into account. Recent years have seen much progress in theory and method development, but existing applications make simplifying assumptions that often break up the dependency between the four processes, or are tailored to specific datasets with matching model assumptions and code. To obtain a method with wider applicability, we have developed a novel approach to reconstruct transmission trees with sequence data. Our approach combines elementary models for transmission, case observation, within-host pathogen dynamics, and mutation, under the assumption that the outbreak is over and all cases have been observed. We use Bayesian inference with MCMC for which we have designed novel proposal steps to efficiently traverse the posterior distribution, taking account of all unobserved processes at once. This allows for efficient sampling of transmission trees from the posterior distribution, and robust estimation of consensus transmission trees. We implemented the proposed method in a new R package phybreak. The method performs well in tests of both new and published simulated data. We apply the model to five datasets on densely sampled infectious disease outbreaks, covering a wide range of epidemiological settings. Using only sampling times and sequences as data, our analyses confirmed the original results or improved on them: the more realistic infection times place more confidence in the inferred transmission trees.
Issue Date: 18-May-2017
Date of Acceptance: 3-Apr-2017
ISSN: 1553-7358
Publisher: Public Library of Science
Journal / Book Title: Plos Computational Biology
Volume: 13
Issue: 5
Copyright Statement: © 2017 Klinkenberg et al. This is an open access article distributed under the terms of the Creative Commons Attribution License, which permits unrestricted use, distribution, and reproduction in any medium, provided the original author and source are credited.
Sponsor/Funder: National Institute for Health Research
Medical Research Council (MRC)
Funder's Grant Number: HPRU-2012-10080
Keywords: Science & Technology
Life Sciences & Biomedicine
Biochemical Research Methods
Mathematical & Computational Biology
Biochemistry & Molecular Biology
Bacterial Infections
Computational Biology
Disease Transmission, Infectious
Genome, Bacterial
Genome, Viral
Polymorphism, Single Nucleotide
Virus Diseases
06 Biological Sciences
08 Information And Computing Sciences
01 Mathematical Sciences
Publication Status: Published
Article Number: ARTN e1005495
Appears in Collections:Mathematics
Applied Mathematics and Mathematical Physics
Faculty of Natural Sciences
Epidemiology, Public Health and Primary Care

This item is licensed under a Creative Commons License Creative Commons