104
IRUS TotalDownloads
Altmetric
Applications of generative probabilistic models for information recovery in 1H NMR metabolomics
File | Description | Size | Format | |
---|---|---|---|---|
Zukowski-E-2018-PhD-Thesis.pdf | Thesis | 3.31 MB | Adobe PDF | View/Open |
Title: | Applications of generative probabilistic models for information recovery in 1H NMR metabolomics |
Authors: | Zukowski, Edward Leon |
Item Type: | Thesis or dissertation |
Abstract: | Metabolomics is a well-established approach for investigation of the metabolic state of an organism usually conducted via high-throughput methods and focusing on quantification and identification of small molecules. A popular analytical technique used in metabolomics is 1H NMR spectroscopy. The data obtained in NMR experiments contains a wealth of information on metabolites in a sample and their chemical structure. To help uncover this information and find patterns in the data, statistical and machine learning methods must be applied. The work presented in this thesis demonstrates applications of probabilistic generative modelling, with particular focus in Latent Dirichlet Allocation (LDA), as a tool for information recovery in 1H NMR data sets obtained in metabolomics research. LDA is an example of a topic model. The model is based on a generative process which can be thought of as a source of the data. Topics are latent variables which select co-occurring metabolites in a sample. In turn, NMR spectra can be represented in the latent variable space. We present applications of LDA in three scenarios. (1) How LDA can be used to simulate NMR spectra; such spectra demonstrate that LDA is a valid model for NMR data and also provide synthetic data for evaluation of statistical models. (2) Unsupervised learning with LDA to uncover patterns in the NMR data; we use synthetics and real NMR data with knowledge of key biomarkers from a prior study and conclude that LDA was successful in the recovery of useful topics. (3) Supervised learning with SLDA and combined latent variable models with ElasticNet regression where we investigate NMR data from The Multi-Ethnic Study of Atherosclerosis (MESA) study which is paired with clinical variables such as BMI. The goal was to examine if topics can be informative about clinical outcomes. |
Content Version: | Open Access |
Issue Date: | Aug-2018 |
Date Awarded: | Feb-2019 |
URI: | http://hdl.handle.net/10044/1/78483 |
DOI: | https://doi.org/10.25560/78483 |
Copyright Statement: | Creative Commons Attribution NonCommercial NoDerivatives Licence |
Supervisor: | Ebbels, Timothy Chayen, Naomi |
Department: | Department of Surgery & Cancer |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Department of Surgery and Cancer PhD Theses |