Improving the estimation of genetic distances from Next-Generation Sequencing data
File(s)draft_revised_v6FL.odt (102.21 KB)
Accepted version
Author(s)
Vieira, FG
Lassalle, F
Korneliussen, TS
Fumagalli, M
Type
Journal Article
Abstract
Next-Generation Sequencing (NGS) technologies have revolutionized research in evolutionary biology, by increasing the sequencing speed and reducing the experimental costs. However, sequencing errors are higher than in traditional technologies and, furthermore, many studies rely on low-depth sequencing. Under these circumstances, the use of standard methods for inferring genotypes leads to biased estimates of nucleotide variation, which can bias all downstream analyses. Through simulations, we assessed the bias in estimating genetic distances under several different scenarios. The results indicate that naive methods for assigning individual genotypes greatly overestimate genetic distances. We propose a novel method to estimate genetic distances that is suitable for low-depth NGS data and takes genotype call statistical uncertainty into account. We applied this method to investigate the genetic structure of domesticated and wild strains of rice. We implemented this approach in an open-source software and discuss further directions of phylogenetic analyses within this novel probabilistic framework.
Date Issued
2015-03-30
Date Acceptance
2015-01-28
Citation
Biological Journal of the Linnean Society, 2015, 117 (1), pp.139-149
ISSN
0024-4066
Publisher
Wiley
Start Page
139
End Page
149
Journal / Book Title
Biological Journal of the Linnean Society
Volume
117
Issue
1
Copyright Statement
© 2015 The Linnean Society of London.
Sponsor
Human Frontier Science Program
Grant Number
LT000320/2014-L
Subjects
Evolutionary Biology
06 Biological Sciences
Publication Status
Published