133
IRUS TotalDownloads
Altmetric
A workflow for integrated processing of multi-cohort untargeted 1H NMR metabolomics data in large scale metabolic epidemiology
File | Description | Size | Format | |
---|---|---|---|---|
NMR Preprocessing Paper - 20160720 noref.docx | Accepted version | 7.18 MB | Microsoft Word | View/Open |
Title: | A workflow for integrated processing of multi-cohort untargeted 1H NMR metabolomics data in large scale metabolic epidemiology |
Authors: | Karaman, I Ferreira, DL Boulange, CL Kaluarachchi, MR Herrington, D Dona, AC Castagné, R Moayyeri, A Lehne, B Loh, M De Vries, PS Dehghan, A Franco, O Hofman, A Evangelou, E Tzoulaki, I Elliott, P Lindon, JC Ebbels, TM |
Item Type: | Journal Article |
Abstract: | Large-scale metabolomics studies involving thousands of samples present multiple challenges in data analysis, particularly when an untargeted platform is used. Studies with multiple cohorts and analysis platforms exacerbate existing problems such as peak alignment and normalization. Therefore, there is a need for robust processing pipelines which can ensure reliable data for statistical analysis. The COMBI-BIO project incorporates serum from approximately 8000 individuals, in 3 cohorts, profiled by 6 assays in 2 phases using both 1H-NMR and UPLC-MS. Here we present the COMBI-BIO NMR analysis pipeline and demonstrate its fitness for purpose using representative quality control (QC) samples. NMR spectra were first aligned and normalized. After eliminating interfering signals, outliers identified using Hotelling’s T2 were removed and a cohort/phase adjustment was applied, resulting in two NMR datasets (CPMG and NOESY). Alignment of the NMR data was shown to increase the correlation-based alignment quality measure from 0.319 to 0.391 for CPMG and from 0.536 to 0.586 for NOESY, showing that the improvement was present across both large and small peaks. End-to-end quality assessment of the pipeline was achieved using Hotelling’s T2 distributions. For CPMG spectra, the interquartile range decreased from 1.425 in raw QC data to 0.679 in processed spectra, while the corresponding change for NOESY spectra was from 0.795 to 0.636 indicating an improvement in precision following processing. PCA indicated that gross phase and cohort differences were no longer present. These results illustrate that the pipeline produces robust and reproducible data, successfully addressing the methodological challenges of this large multi-faceted study. |
Issue Date: | 15-Sep-2016 |
Date of Acceptance: | 1-Sep-2016 |
URI: | http://hdl.handle.net/10044/1/40307 |
DOI: | https://dx.doi.org/10.1021/acs.jproteome.6b00125 |
ISSN: | 1535-3907 |
Publisher: | American Chemical Society |
Start Page: | 4188 |
End Page: | 4194 |
Journal / Book Title: | Journal of Proteome Research |
Volume: | 15 |
Issue: | 12 |
Copyright Statement: | © 2016 American Chemical Society. This document is the Accepted Manuscript version of a Published Work that appeared in final form in Journal of Proteome Research, after peer review and technical editing by the publisher. To access the final edited and published work see http://dx.doi.org/10.1021/acs.jproteome.6b00125. |
Sponsor/Funder: | National Institute for Health Research Commission of the European Communities Medical Research Council (MRC) Commission of the European Communities Medical Research Council (MRC) Medical Research Council (MRC) National Institute for Health Research European Molecular Biology Laboratory |
Funder's Grant Number: | NF-SI-0611-10136 305422 MC_PC_12025 312941 MR/L01632X/1 MR/L01341X/1 RTJ6219303-1 654241 |
Keywords: | NMR alignment epidemiology large scale metabolomics multicohort normalization preprocessing quality control Biochemistry & Molecular Biology 06 Biological Sciences 03 Chemical Sciences |
Publication Status: | Published |
Appears in Collections: | Department of Surgery and Cancer |