Identification and characterization of Coronaviridae genomes from Vietnamese bats and rats based on conserved protein domains

Title: Identification and characterization of Coronaviridae genomes from Vietnamese bats and rats based on conserved protein domains
Authors: Phan, MVT
Tue, NT
Pham, HA
Baker, S
Kellam, P
Cotten, M
Bach, TK
Berto, A
Boni, MF
Bryant, JE
Bui, DP
Campbell, JI
Carrique-Mas, J
Dang, MH
Dang, TH
Dang, TO
Day, JN
Dinh, VT
Van Doorn, HR
Duong, AH
Farrar, JJ
Hau, TTT
Hoang, BL
Hoang, VD
Huynh, TKT
Lam, CC
Le, MH
Le, TP
Le, TP
Le, TP
Le, XL
Luu, TTH
Ly, VC
Mai, TPL
Nadjm, B
Ngo, TB
Ngo, TH
Nguyen, CT
Nguyen, DT
Nguyen, D
Nguyen, KC
Nguyen, NA
Nguyen, NV
Nguyen, QH
Nguyen, TD
Nguyen, TM
Nguyen, TB
Nguyen, THT
Nguyen, THT
Nguyen, TKC
Nguyen, TLN
Nguyen, TLH
Nguyen, TNL
Nguyen, TND
Nguyen, TN
Nguyen, TSC
Nguyen, TYC
Nguyen, TT
Nguyen, TV
Nguyen, VC
Nguyen, VH
Nguyen, VK
Nguyen, VMH
Nguyen, V
Nguyen, VT
Nguyen, VT
Nguyen, VVC
Nguyen, VX
Pham, HM
Pham, TMK
Pham, TTT
Pham, VL
Pham, VM
Phan, VBB
Rabaa, MA
Rahman, M
Thompson, C
Thwaites, G
Tran, DHN
Tran, HMC
Tran, KT
Tran, MP
Tran, TKH
Tran, TND
Tran, TTT
Tran, TTM
Tran, TN
Tran, TH
Trinh, QT
Vo, BH
Vo, NT
Vo, QC
Voong, VP
Wertheim, H
Bogaardt, C
Chase-Topping, M
Ivens, A
Lu, L
Dung, N
Rambaut, A
Simmonds, P
Woolhouse, M
Munnink, BO
Deijs, M
Van der Hoek, L
Jebbink, MF
Farsani, SMJ
Dodd, K
Euren, J
Lucas, A
Ortiz, N
Pennacchio, L
Rubin, E
Saylors, KE
Tran, MH
Wolfe, ND
Item Type: Journal Article
Abstract: The Coronaviridae family of viruses encompasses a group of pathogens with a zoonotic potential as observed from previous outbreaks of the severe acute respiratory syndrome coronavirus and Middle East respiratory syndrome coronavirus. Accordingly, it seems important to identify and document the coronaviruses in animal reservoirs, many of which are uncharacterized and potentially missed by more standard diagnostic assays. A combination of sensitive deep sequencing technology and computational algorithms is essential for virus surveillance, especially for characterizing novel- or distantly related virus strains. Here, we explore the use of profile Hidden Markov Model-defined Pfam protein domains (Pfam domains) encoded by new sequences as a Coronaviridae sequence classification tool. The encoded domains are used first in a triage to identify potential Coronaviridae sequences and then processed using a Random Forest method to classify the sequences to the Coronaviridae genus level. The application of this algorithm on Coronaviridae genomes assembled from agnostic deep sequencing data from surveillance of bats and rats in Dong Thap province (Vietnam) identified thirty-four Alphacoronavirus and eleven Betacoronavirus genomes. This collection of bat and rat coronaviruses genomes provided essential information on the local diversity of coronaviruses and substantially expanded the number of coronavirus full genomes available from bat and rats and may facilitate further molecular studies on this group of viruses.
Issue Date: 1-Jul-2018
Date of Acceptance: 1-Jul-2018
ISSN: 2057-1577
Publisher: Oxford University Press (OUP)
Journal / Book Title: Virus Evolution
Volume: 4
Issue: 2
Copyright Statement: © 2018 The Author(s). This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Sponsor/Funder: Wellcome Trust
Funder's Grant Number: 093724/E/10/Z
Keywords: Science & Technology
Life Sciences & Biomedicine
virus classification
machine learning
random forest
protein domains
profile Hidden Markov model
Publication Status: Published
Article Number: vey035
Online Publication Date: 2018-12-15
Appears in Collections:Department of Medicine

Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Creative Commons