121
IRUS Total
Downloads
  Altmetric

Unsupervised learning with graph theoretical algorithms and its applications to transcriptomic data analysis

File Description SizeFormat 
Liu-Z-2019-Phd-Thesis.pdf9.7 MBAdobe PDFView/Open
Title: Unsupervised learning with graph theoretical algorithms and its applications to transcriptomic data analysis
Authors: Liu, Zijing
Item Type: Thesis or dissertation
Abstract: High-throughput sequencing technologies bring a large amount of data in genomic research, with complex structure and of high dimension. With the aim of extracting meaningful knowledge from a simplified representation, we develop graph-based methods for analysing high dimensional data, focusing on clustering analysis and dimensionality reduction. We first study the problem of graph partition, which is closely connected with clustering analysis. With spectral methods, we reformulate a dynamical based multiscale graph partition framework as a max-sum vector partitioning problem. The graph nodes are embedded as vectors varying with the time of a Markov process running on the graph, which leads to multi-resolution graph partitions. Our derivation also clarifies the quantity optimised by k-means in graph partition, and establishes its connection to spectral clustering. Clustering analysis with multiscale graph partitioning is then investigated. Different methods for estimating a graph from the vector data are compared empirically on real datasets. The advantage of using multiscale graph partitioning for clustering is illustrated with both synthetic and real data. We further propose a similarity measure for time-dependent data based on a Gaussian process model. An RNA sequencing, time course dataset is analysed as an example application. Finally, we integrate the graph theoretical clustering and a graph-based dimensionality reduction method with Gaussian processes. We exemplify our approach through the analysis of a transcriptomic dataset of cellular reprogramming from B-cells to iPSCs. We extract a landscape that describes the reprogramming process and identify associated genes for clustering analysis. We also reconstruct another landscape from an integrated transcriptomic dataset characterising the hematopoietic differentiation process from stem cells to somatic cells. The differences between the forward and backward processes are then studied by integrating two landscapes.
Content Version: Open Access
Issue Date: Sep-2018
Date Awarded: Jun-2019
URI: http://hdl.handle.net/10044/1/70845
DOI: https://doi.org/10.25560/70845
Copyright Statement: Creative Commons Attribution Non-Commercial No Derivatives licence
Supervisor: Mauricio, Barahona
Sponsor/Funder: European Commission
Funder's Grant Number: FP7 no. 607466
Department: Chemistry
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Chemistry PhD theses