Unsupervised learning with graph theoretical algorithms and its applications to transcriptomic data analysis
Author(s)
Liu, Zijing
Type
Thesis or dissertation
Abstract
High-throughput sequencing technologies bring a large amount of data in genomic research, with complex structure and of high dimension. With the aim of extracting meaningful knowledge from a simplified representation, we develop graph-based methods for analysing high dimensional data, focusing on clustering analysis and dimensionality reduction.
We first study the problem of graph partition, which is closely connected with clustering analysis. With spectral methods, we reformulate a dynamical based multiscale graph partition framework as a max-sum vector partitioning problem. The graph nodes are embedded as vectors varying with the time of a Markov process running on the graph, which leads to multi-resolution graph partitions. Our derivation also clarifies the quantity optimised by k-means in graph partition, and establishes its connection to spectral clustering.
Clustering analysis with multiscale graph partitioning is then investigated. Different methods for estimating a graph from the vector data are compared empirically on real datasets. The advantage of using multiscale graph partitioning for clustering is illustrated with both synthetic and real data. We further propose a similarity measure for time-dependent data based on a Gaussian process model. An RNA sequencing, time course dataset is analysed as an example application.
Finally, we integrate the graph theoretical clustering and a graph-based dimensionality reduction method with Gaussian processes. We exemplify our approach through the analysis of a transcriptomic dataset of cellular reprogramming from B-cells to iPSCs. We extract a landscape that describes the reprogramming process and identify associated genes for clustering analysis. We also reconstruct another landscape from an integrated transcriptomic dataset characterising the hematopoietic differentiation process from stem cells to somatic cells. The differences between the forward and backward processes are then studied by integrating two landscapes.
We first study the problem of graph partition, which is closely connected with clustering analysis. With spectral methods, we reformulate a dynamical based multiscale graph partition framework as a max-sum vector partitioning problem. The graph nodes are embedded as vectors varying with the time of a Markov process running on the graph, which leads to multi-resolution graph partitions. Our derivation also clarifies the quantity optimised by k-means in graph partition, and establishes its connection to spectral clustering.
Clustering analysis with multiscale graph partitioning is then investigated. Different methods for estimating a graph from the vector data are compared empirically on real datasets. The advantage of using multiscale graph partitioning for clustering is illustrated with both synthetic and real data. We further propose a similarity measure for time-dependent data based on a Gaussian process model. An RNA sequencing, time course dataset is analysed as an example application.
Finally, we integrate the graph theoretical clustering and a graph-based dimensionality reduction method with Gaussian processes. We exemplify our approach through the analysis of a transcriptomic dataset of cellular reprogramming from B-cells to iPSCs. We extract a landscape that describes the reprogramming process and identify associated genes for clustering analysis. We also reconstruct another landscape from an integrated transcriptomic dataset characterising the hematopoietic differentiation process from stem cells to somatic cells. The differences between the forward and backward processes are then studied by integrating two landscapes.
Version
Open Access
Date Issued
2018-09
Date Awarded
2019-06
Copyright Statement
Creative Commons Attribution Non-Commercial No Derivatives licence
Advisor
Mauricio, Barahona
Sponsor
European Commission
Grant Number
FP7 no. 607466
Publisher Department
Chemistry
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)