Scalable Open Science Approach for Mutation Calling of Tumor Exomes Using Multiple Genomic Pipelines
Author(s)
Type
Journal Article
Abstract
The Cancer Genome Atlas (TCGA) cancer genomics dataset includes over 10,000 tumor-normal exome pairs across 33 different cancer types, in total >400 TB of raw data files requiring analysis. Here we describe the Multi-Center Mutation Calling in Multiple Cancers project, our effort to generate a comprehensive encyclopedia of somatic mutation calls for the TCGA data to enable robust cross-tumor-type analyses. Our approach accounts for variance and batch effects introduced by the rapid advancement of DNA extraction, hybridization-capture, sequencing, and analysis methods over time. We present best practices for applying an ensemble of seven mutation-calling algorithms with scoring and artifact filtering. The dataset created by this analysis includes 3.5 million somatic variants and forms the basis for PanCan Atlas papers. The results have been made available to the research community along with the methods used to generate them. This project is the result of collaboration from a number of institutes and demonstrates how team science drives extremely large genomics projects.
Date Issued
2018-03-28
Date Acceptance
2018-03-01
Citation
Cell Systems, 2018, 6 (3), pp.271-281
ISSN
2405-4712
Publisher
Elsevier (Cell Press)
Start Page
271
End Page
281
Journal / Book Title
Cell Systems
Volume
6
Issue
3
Copyright Statement
© 2018 The Authors. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/by-nc-nd/4.0/).
Sponsor
SAIC-F-Frederick, Inc
Leidos Biomedical Research, Inc.
Identifier
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000428798600014&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
Grant Number
TCGA Pilot Program
15Y011ST
Subjects
Science & Technology
Life Sciences & Biomedicine
Biochemistry & Molecular Biology
Cell Biology
SOMATIC POINT MUTATIONS
CANCER
PanCanAtlas project
TCGA
large-scale
open science
pan-cancer
reproducible computing
somatic mutation calling
MC3 Working Group
Cancer Genome Atlas Research Network
Publication Status
Published
Date Publish Online
2018-03-28