Joint probabilistic modelling of images and non-imaging covariates: a causal perspective
File(s)
Author(s)
Coelho de Castro, Daniel
Type
Thesis or dissertation
Abstract
Typical machine-learning tasks in imaging applications involve training a model to predict some target annotation (e.g. a class label) from an image. However, by neglecting valuable side-information that is often available in real-world settings, such models are vulnerable to confounding and may not generalise outside the lab environment. This thesis argues for the integration of causality into image analysis workflows, supported by the development of methodologies for jointly modelling images and non-imaging data, especially for healthcare applications. We first demonstrate that causal reasoning can shed new light on some of the greatest challenges in medical image analysis: data scarcity, i.e. the limited availability of high-quality annotated data; and data mismatch, whereby exogenous domain differences or sample selection may threaten the external validity of a predictive model. Turning to practical aspects of probabilistic modelling, we verify the suitability of nonparametric mixtures to modelling image intensities, and employ them in a novel intensity normalisation method. We then leverage similar techniques to develop a unified probabilistic model for a (benign) face recognition application, incorporating face features, unlimited identities, sparse and noisy name labels, and contextual side-information. To facilitate more general research into learning with heterogeneous data, we additionally introduce an experimental sandbox based on images of handwritten digits, including measurable shape attributes and controllable image perturbations. Finally, bridging causality theory and deep probabilistic modelling, we propose the framework of deep structural causal models, which offer tractable generative, interventional, and counterfactual capabilities for high-dimensional, heterogeneous data. Along with the increased robustness of models that can naturally integrate multiple sources of data, further considerations surrounding transparency, interpretability, explainability, and fairness make it clear that a causal perspective contributes towards building machine-learning systems that are not just highly performant, but also safe and reliable.
Version
Open Access
Date Issued
2020-09
Date Awarded
2021-03
Copyright Statement
Creative Commons Attribution Licence
License URL
Advisor
Glocker, Benjamin
Deisenroth, Marc
Sponsor
CAPES (Organization : Brazil)
European Research Council
Grant Number
Finance Code 001, Full Doctorate Abroad Scholarship BEX 1500/2015-05
Horizon 2020 grant agreement No 757173, Project MIRA, ERC-2017-STG
Publisher Department
Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)