Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • About
  • Communities & Collections
  • Advanced Search
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Medicine
  3. Faculty of Medicine
  4. A fast likelihood solution to the genetic clustering problem
 
  • Details
A fast likelihood solution to the genetic clustering problem
File(s)
Beugin_et_al-2018-Methods_in_Ecology_and_Evolution.pdf (1.09 MB)
Published version
Author(s)
Beugin, Marie-Pauline
Gayet, Thibault
Pontier, Dominique
Devillard, Sébastien
Jombart, Thibaut
Type
Journal Article
Abstract
The investigation of genetic clusters in natural populations is an ubiquitous problem in a range of fields relying on the analysis of genetic data, such as molecular ecology, conservation biology and microbiology. Typically, genetic clusters are defined as distinct panmictic populations, or parental groups in the context of hybridisation. Two types of methods have been developed for identifying such clusters: model-based methods, which are usually computer-intensive but yield results which can be interpreted in the light of an explicit population genetic model, and geometric approaches, which are less interpretable but remarkably faster.
Here, we introduce snapclust, a fast maximum-likelihood solution to the genetic clustering problem, which allies the advantages of both model-based and geometric approaches. Our method relies on maximising the likelihood of a fixed number of panmictic populations, using a combination of geometric approach and fast likelihood optimisation, using the Expectation-Maximisation (EM) algorithm. It can be used for assigning genotypes to populations and optionally identify various types of hybrids between two parental populations. Several goodness-of-fit statistics can also be used to guide the choice of the retained number of clusters.
Using extensive simulations, we show that snapclust performs comparably to current gold standards for genetic clustering as well as hybrid detection, with some advantages for identifying hybrids after several backcrosses, while being orders of magnitude faster than other model-based methods. We also illustrate how snapclust can be used for identifying the optimal number of clusters, and subsequently assign individuals to various hybrid classes simulated from an empirical microsatellite dataset.
snapclust is implemented in the package adegenet for the free software R, and is therefore easily integrated into existing pipelines for genetic data analysis. It can be applied to any kind of co-dominant markers, and can easily be extended to more complex models including, for instance, varying ploidy levels. Given its flexibility and computer-efficiency, it provides a useful complement to the existing toolbox for the study of genetic diversity in natural populations.
Date Issued
2018-04-01
Date Acceptance
2018-01-04
Citation
Methods in Ecology and Evolution, 2018, 9 (4), pp.1006-1016
URI
http://hdl.handle.net/10044/1/56007
DOI
https://www.dx.doi.org/10.1111/2041-210X.12968
ISSN
2041-210X
Publisher
Wiley
Start Page
1006
End Page
1016
Journal / Book Title
Methods in Ecology and Evolution
Volume
9
Issue
4
Copyright Statement
© 2018 The Authors. Methods in Ecology and Evolution published by John Wiley & Sons Ltd on behalf of British Ecological Society This is an open access article under the terms of the Creative Commons Attribution License, which permits use, distribution and reproduction in any medium, provided the original work is properly cited.
License URL
http://creativecommons.org/licenses/by/4.0/
Sponsor
Medical Research Council (MRC)
Grant Number
MR/K010174/1B
Subjects
Science & Technology
Life Sciences & Biomedicine
Ecology
Environmental Sciences & Ecology
EM algorithm
genetic assignment
genetic clustering
hybridisation
microsatellites
population membership
relative performances
SNP
MULTILOCUS GENOTYPE DATA
POPULATION-STRUCTURE
R-PACKAGE
MULTIVARIATE-ANALYSIS
MAXIMUM-LIKELIHOOD
DNA-SEQUENCES
SNP DATA
MODEL
INFERENCE
MARKERS
0602 Ecology
0603 Evolutionary Biology
Publication Status
Published
Date Publish Online
2018-01-08
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback