32
IRUS Total
Downloads
  Altmetric

bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data

File Description SizeFormat 
btz726.pdfPublished version1.89 MBAdobe PDFView/Open
Title: bayNorm: Bayesian gene expression recovery, imputation and normalisation for single cell RNA-sequencing data
Authors: Tang, W
Bertaux, F
Thomas, P
Stefanelli, C
Saint, M
Marguerat, S
Shahrezaei, V
Item Type: Journal Article
Abstract: Motivation:Normalisation of single cell RNA sequencing (scRNA-seq) data is a prerequisite to theirinterpretation. The marked technical variability, high amounts of missing observations and batch effecttypical of scRNA-seq datasets make this task particularly challenging. There is a need for an efficient andunified approach for normalisation, imputation and batch effect correction.Results:Here, we introduce bayNorm, a novel Bayesian approach for scaling and inference of scRNA-seq counts. The method’s likelihood function follows a binomial model of mRNA capture, while priorsare estimated from expression values across cells using an empirical Bayes approach. We first validateour assumptions by showing this model can reproduce different statistics observed in real scRNA-seqdata. We demonstrate using publicly-available scRNA-seq datasets and simulated expression data thatbayNorm allows robust imputation of missing values generating realistic transcript distributions that matchsingle molecule FISH measurements. Moreover, by using priors informed by dataset structures, bayNormimproves accuracy and sensitivity of differential expression analysis and reduces batch effect comparedto other existing methods. Altogether, bayNorm provides an efficient, integrated solution for global scalingnormalisation, imputation and true count recovery of gene expression measurements from scRNA-seqdata.Availability:The R package “bayNorm” is available at https://github.com/WT215/bayNorm. The code foranalysing data in this paper is available at https://github.com/WT215/bayNorm_papercode.Contact:samuel.marguerat@imperial.ac.uk or v.shahrezaei@imperial.ac.ukSupplementary information:Supplementary data are available atBioinformaticsonline.
Issue Date: 15-Feb-2020
Date of Acceptance: 27-Sep-2019
URI: http://hdl.handle.net/10044/1/73640
DOI: 10.1093/bioinformatics/btz726
ISSN: 1367-4803
Publisher: Oxford University Press (OUP)
Start Page: 1174
End Page: 1181
Journal / Book Title: Bioinformatics
Volume: 36
Issue: 4
Copyright Statement: © The Author(s) 2019. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Sponsor/Funder: The Leverhulme Trust
Engineering & Physical Science Research Council (EPSRC)
Funder's Grant Number: RPG-2014-408
EP/N014529/1
Keywords: Bioinformatics
01 Mathematical Sciences
06 Biological Sciences
08 Information and Computing Sciences
Publication Status: Published
Online Publication Date: 2019-10-04
Appears in Collections:Institute of Clinical Sciences
Applied Mathematics and Mathematical Physics
Faculty of Natural Sciences
Mathematics