Computational approaches for metagenomic analysis of high-throughput sequencing data

File Description SizeFormat 
Ainsworth-D-2017-PhD-Thesis.pdfThesis1.52 MBAdobe PDFView/Open
Title: Computational approaches for metagenomic analysis of high-throughput sequencing data
Authors: Ainsworth, David
Item Type: Thesis or dissertation
Abstract: High-throughput DNA sequencing has revolutionised microbiology and is the foundation on which the nascent field of metagenomics has been built. This ability to cheaply sample billions of DNA reads directly from environments has democratised sequencing and allowed researchers to gain unprecedented insights into diverse microbial communities. These technologies however are not without their limitations: the short length of the reads requires the production of vast amounts of data to ensure all information is captured. This “data deluge” has been a major bottleneck and has necessitated the development of new algorithms for analysis. Sequence alignment methods provide the most information about the composition of a sample as they allow both taxonomic and functional classification but algorithms are prohibitively slow. This inefficiency has led to the reliance on faster algorithms which only produce simple taxonomic classification or abundance estimation, losing the valuable information given by full alignments against annotated genomes. This thesis will describe k-SLAM, a novel ultra-fast method for the alignment and taxonomic classification of metagenomic data. Using a k -mer based method k-SLAM achieves speeds three orders of magnitude faster than current alignment based approaches, allowing a full taxonomic classification and gene identification to be tractable on modern large datasets. The alignments found by k-SLAM can also be used to find variants and identify genes, along with their nearest taxonomic origins. A novel pseudo-assembly method produces more specific taxonomic classifications on species which have high sequence identity within their genus. This provides a significant (up to 40%) increase in accuracy on these species. Also described is a re-analysis of a Shiga-toxin producing E. coli O104:H4 isolate via alignment against bacterial and viral species to find antibiotic resistance and toxin producing genes. k-SLAM has been used by a range of research projects including FLORINASH and is currently being used by a number of groups.
Content Version: Open Access
Issue Date: Aug-2016
Date Awarded: Jan-2017
URI: http://hdl.handle.net/10044/1/44070
Supervisor: Sternberg, Michael
Butcher, Sarah
Knottenbelt, William
Sponsor/Funder: Biotechnology and Biological Sciences Research Council
Illumina Inc.
Funder's Grant Number: BB/I01585X/1
Department: Life Sciences
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Life Sciences PhD theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Creative Commonsx