104
IRUS Total
Downloads
  Altmetric

Applications of generative probabilistic models for information recovery in 1H NMR metabolomics

File Description SizeFormat 
Zukowski-E-2018-PhD-Thesis.pdfThesis3.31 MBAdobe PDFView/Open
Title: Applications of generative probabilistic models for information recovery in 1H NMR metabolomics
Authors: Zukowski, Edward Leon
Item Type: Thesis or dissertation
Abstract: Metabolomics is a well-established approach for investigation of the metabolic state of an organism usually conducted via high-throughput methods and focusing on quantification and identification of small molecules. A popular analytical technique used in metabolomics is 1H NMR spectroscopy. The data obtained in NMR experiments contains a wealth of information on metabolites in a sample and their chemical structure. To help uncover this information and find patterns in the data, statistical and machine learning methods must be applied. The work presented in this thesis demonstrates applications of probabilistic generative modelling, with particular focus in Latent Dirichlet Allocation (LDA), as a tool for information recovery in 1H NMR data sets obtained in metabolomics research. LDA is an example of a topic model. The model is based on a generative process which can be thought of as a source of the data. Topics are latent variables which select co-occurring metabolites in a sample. In turn, NMR spectra can be represented in the latent variable space. We present applications of LDA in three scenarios. (1) How LDA can be used to simulate NMR spectra; such spectra demonstrate that LDA is a valid model for NMR data and also provide synthetic data for evaluation of statistical models. (2) Unsupervised learning with LDA to uncover patterns in the NMR data; we use synthetics and real NMR data with knowledge of key biomarkers from a prior study and conclude that LDA was successful in the recovery of useful topics. (3) Supervised learning with SLDA and combined latent variable models with ElasticNet regression where we investigate NMR data from The Multi-Ethnic Study of Atherosclerosis (MESA) study which is paired with clinical variables such as BMI. The goal was to examine if topics can be informative about clinical outcomes.
Content Version: Open Access
Issue Date: Aug-2018
Date Awarded: Feb-2019
URI: http://hdl.handle.net/10044/1/78483
DOI: https://doi.org/10.25560/78483
Copyright Statement: Creative Commons Attribution NonCommercial NoDerivatives Licence
Supervisor: Ebbels, Timothy
Chayen, Naomi
Department: Department of Surgery & Cancer
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Department of Surgery and Cancer PhD Theses