121
IRUS TotalDownloads
Altmetric
Unsupervised learning with graph theoretical algorithms and its applications to transcriptomic data analysis
File | Description | Size | Format | |
---|---|---|---|---|
Liu-Z-2019-Phd-Thesis.pdf | 9.7 MB | Adobe PDF | View/Open |
Title: | Unsupervised learning with graph theoretical algorithms and its applications to transcriptomic data analysis |
Authors: | Liu, Zijing |
Item Type: | Thesis or dissertation |
Abstract: | High-throughput sequencing technologies bring a large amount of data in genomic research, with complex structure and of high dimension. With the aim of extracting meaningful knowledge from a simplified representation, we develop graph-based methods for analysing high dimensional data, focusing on clustering analysis and dimensionality reduction. We first study the problem of graph partition, which is closely connected with clustering analysis. With spectral methods, we reformulate a dynamical based multiscale graph partition framework as a max-sum vector partitioning problem. The graph nodes are embedded as vectors varying with the time of a Markov process running on the graph, which leads to multi-resolution graph partitions. Our derivation also clarifies the quantity optimised by k-means in graph partition, and establishes its connection to spectral clustering. Clustering analysis with multiscale graph partitioning is then investigated. Different methods for estimating a graph from the vector data are compared empirically on real datasets. The advantage of using multiscale graph partitioning for clustering is illustrated with both synthetic and real data. We further propose a similarity measure for time-dependent data based on a Gaussian process model. An RNA sequencing, time course dataset is analysed as an example application. Finally, we integrate the graph theoretical clustering and a graph-based dimensionality reduction method with Gaussian processes. We exemplify our approach through the analysis of a transcriptomic dataset of cellular reprogramming from B-cells to iPSCs. We extract a landscape that describes the reprogramming process and identify associated genes for clustering analysis. We also reconstruct another landscape from an integrated transcriptomic dataset characterising the hematopoietic differentiation process from stem cells to somatic cells. The differences between the forward and backward processes are then studied by integrating two landscapes. |
Content Version: | Open Access |
Issue Date: | Sep-2018 |
Date Awarded: | Jun-2019 |
URI: | http://hdl.handle.net/10044/1/70845 |
DOI: | https://doi.org/10.25560/70845 |
Copyright Statement: | Creative Commons Attribution Non-Commercial No Derivatives licence |
Supervisor: | Mauricio, Barahona |
Sponsor/Funder: | European Commission |
Funder's Grant Number: | FP7 no. 607466 |
Department: | Chemistry |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Chemistry PhD theses |