Bayesian latent feature modelling for unstructured data
File(s)
Author(s)
Zhang, Xinyu
Type
Thesis or dissertation
Abstract
Bayesian latent mixture modelling can be applied to a wide range of applications such as natural language processing, image classification, social network analysis and bioinformatics. The idea is to infer the structural relationships in data through unobserved features, which provide a low-dimensional representation of data for different tasks. Posterior inference is computationally expensive since the number of hidden parameters increases with the size of the data set. Also, model parameters are not identifiable because likelihood functions have many genuine modes besides label-switching of mixture components.
To solve these problems, a complete framework for learning latent relational structures in data is developed in this thesis. We fully utilise the bag-of-words assumption and generate Poisson counting variables associated with each feature directly. As a result, the sampling complexity of Markov chain Monte Carlo estimation methods is significantly reduced. Several extensions of the framework are further proposed to accommodate different assumptions about data including sparsity, time dynamics and correlation. Our approach is straightforward and flexible, and is empirically proven to be superior to most of the existing topic models.
To solve these problems, a complete framework for learning latent relational structures in data is developed in this thesis. We fully utilise the bag-of-words assumption and generate Poisson counting variables associated with each feature directly. As a result, the sampling complexity of Markov chain Monte Carlo estimation methods is significantly reduced. Several extensions of the framework are further proposed to accommodate different assumptions about data including sparsity, time dynamics and correlation. Our approach is straightforward and flexible, and is empirically proven to be superior to most of the existing topic models.
Version
Open Access
Date Issued
2020-11
Date Awarded
2021-03
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Heard, Nicholas
Battey, Heather
Publisher Department
Mathematics
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)