Probabilistic machine learning for aggregated and multivariate data
File(s)
Author(s)
Zhu, Harrison
Type
Thesis or dissertation
Abstract
There has been rapid progress in the field of probabilistic machine learning in recent years, ranging from the development of models to the development of inference algorithms and machine learning hardware. In particular, Bayesian methods for modelling functions using stochastic process priors, such as Gaussian processes, have proven to be versatile tools for modelling complex problems. In this thesis, we leverage these modern developments to tackle problems where the quantities of interest are aggregated or multivariate.
We first discuss a classical modelling problem involving the estimation of integrals that has been extensively studied by the numerical analysis community for a variety of applications. We propose a novel Bayesian probabilistic numerical integration method using Bayesian additive regression trees and demonstrate its effectiveness on a variety of numerical integration problems compared to popular existing approaches.
We then discuss another problem where the observed data is now aggregated, as often observed in survey data such as Malaria cases or crop yield data. These data points are observed at the county levels but can utilise higher resolution covariates from satellite imagery, which are at the pixel levels. We propose a novel aggregated Gaussian process model that can leverage covariates of varying resolutions and demonstrate its effectiveness on a variety of real-world datasets.
Finally, we discuss a problem where the observed data is multivariate, for instance, video or climate data. We propose a novel Markovian Gaussian process variational autoencoder that can leverage neural networks to embed the probabilistic modelling task to the latent space and achieve scalable inference using Kalman filtering and smoothing. We demonstrate its effectiveness on a variety of popular datasets and compare it to existing Bayesian deep learning approaches.
We first discuss a classical modelling problem involving the estimation of integrals that has been extensively studied by the numerical analysis community for a variety of applications. We propose a novel Bayesian probabilistic numerical integration method using Bayesian additive regression trees and demonstrate its effectiveness on a variety of numerical integration problems compared to popular existing approaches.
We then discuss another problem where the observed data is now aggregated, as often observed in survey data such as Malaria cases or crop yield data. These data points are observed at the county levels but can utilise higher resolution covariates from satellite imagery, which are at the pixel levels. We propose a novel aggregated Gaussian process model that can leverage covariates of varying resolutions and demonstrate its effectiveness on a variety of real-world datasets.
Finally, we discuss a problem where the observed data is multivariate, for instance, video or climate data. We propose a novel Markovian Gaussian process variational autoencoder that can leverage neural networks to embed the probabilistic modelling task to the latent space and achieve scalable inference using Kalman filtering and smoothing. We demonstrate its effectiveness on a variety of popular datasets and compare it to existing Bayesian deep learning approaches.
Version
Open Access
Date Issued
2023-09
Online Publication Date
2024-10-17T08:31:32Z
Date Awarded
2024-09
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Flaxman, Seth
Li, Yingzhen
Sponsor
Cervest Limited
Engineering and Physical Sciences Research Council
Grant Number
EP/S023151/1
Publisher Department
Department of Mathematics
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)