Statistical approaches for metabolomics and omics data integration
File(s)
Author(s)
Jendoubi, Takoua
Type
Thesis or dissertation
Abstract
Biological processes are the result of multiple interactions between various
omic entities and are inherently complex. Metabolomics profiling plays a key
role into deciphering mechanisms of biological functions in living organisms
and is hence gaining popularity. In the last twenty years, the parallel acquisition
of high-throughput datasets from the genome, metabolome, proteome, and
transcriptome has seen a tremendous boost. The integrative analysis of these
datasets is promising to enhance the understanding of biological functions and
uncover their underlying mechanisms.
The main objectives of this thesis consist in i) developing and investigating
novel statistical models for integrative analysis of metabolomics data with
other omics technologies, ii) enriching the offer of probabilistic models tailored
to metabolomics data and iii) providing enhanced interpretability of results.
This thesis is mainly concerned with designing models that can be introduced
in different steps of a typical analysis pipeline of metabolomics data. Chapters
2 and 3 motivate the importance of data integration and review popular statistical
techniques used in metabolomics. Chapter 4 simultaneously covers two
steps of the analysis pipeline, by building a single integrative Bayesian model
that is able to perform both cross-omics biomarker discovery and infer potential
perturbed pathways. Chapter 5 focuses solely on integrative statistical analysis
by uncovering hidden associations between multi-omics data. Finally, in Chapter
6 we investigate the incorporation of pathway information into a Bayesian
nonparametric clustering model and its potential to help metabolite annotation.
Where possible, simulation studies are used to get a better understanding of
our methods and test their applicability. These simulations are always followed
by analysis of real data and comparison to competing methods. In most instances,
our methods have resulted in plausible biological findings when applied
to real data, and represent, to our knowledge, one of the first applications of
such probabilistic models in integrative analysis of metabolomics data.
omic entities and are inherently complex. Metabolomics profiling plays a key
role into deciphering mechanisms of biological functions in living organisms
and is hence gaining popularity. In the last twenty years, the parallel acquisition
of high-throughput datasets from the genome, metabolome, proteome, and
transcriptome has seen a tremendous boost. The integrative analysis of these
datasets is promising to enhance the understanding of biological functions and
uncover their underlying mechanisms.
The main objectives of this thesis consist in i) developing and investigating
novel statistical models for integrative analysis of metabolomics data with
other omics technologies, ii) enriching the offer of probabilistic models tailored
to metabolomics data and iii) providing enhanced interpretability of results.
This thesis is mainly concerned with designing models that can be introduced
in different steps of a typical analysis pipeline of metabolomics data. Chapters
2 and 3 motivate the importance of data integration and review popular statistical
techniques used in metabolomics. Chapter 4 simultaneously covers two
steps of the analysis pipeline, by building a single integrative Bayesian model
that is able to perform both cross-omics biomarker discovery and infer potential
perturbed pathways. Chapter 5 focuses solely on integrative statistical analysis
by uncovering hidden associations between multi-omics data. Finally, in Chapter
6 we investigate the incorporation of pathway information into a Bayesian
nonparametric clustering model and its potential to help metabolite annotation.
Where possible, simulation studies are used to get a better understanding of
our methods and test their applicability. These simulations are always followed
by analysis of real data and comparison to competing methods. In most instances,
our methods have resulted in plausible biological findings when applied
to real data, and represent, to our knowledge, one of the first applications of
such probabilistic models in integrative analysis of metabolomics data.
Version
Open Access
Date Issued
2018-11
Date Awarded
2019-12
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Strimmer, Korbinian
Ebbels, Timothy
Glen, Robert
Dumas, Marc-Emmanuel
Sponsor
Wellcome Trust (London, England)
Grant Number
WPEA_PS2441
Publisher Department
Department of Epidemiology and Biostatistics
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)