Evaluation of machine learning algorithms for predictive modelling of antibiotic resistance using genomic and spatiotemporal data
File(s)
Author(s)
Jeffrey, Benjamin
Type
Thesis
Abstract
The discovery of antibiotics transformed public health, as previously life-threatening
bacterial diseases became treatable. However, the selection pressure created by the
widespread use of antibiotics promoted the expansion of many antibiotic resistant
bacterial lineages. Additionally, since the “golden age of antibiotic discovery” (c.1950-
1970), the rate of discovery of novel antibiotics has steeply declined. This compounds
the major public health threat posed by the phenomenon of antibiotic resistance.
To limit the further dissemination of antibiotic resistant bacteria, and improve patient
health outcomes, predictive models could be developed to assist clinicians and public
health agencies in decision making. Here, I consider two distinct applications of machine
learning for predictive modelling in this field:
• Prediction of the antibiotic resistance phenotype of bacterial lineages using
genomic data.
• Forecasting of the future prevalence of antibiotic resistance from surveillance
data.
Across these two related areas I made the following contributions to the field. Firstly, I
established robust protocols for evaluating these models, and demonstrate that without
this it is difficult to fairly evaluate the performance of these models. Secondly, I present
alternative methods for designing predictive models of bacterial phenotype which
generalise accurately across populations, and investigate the utility of conditioning
these models on a graph representation of the bacterial pangenome. Thirdly, I compare
approaches for forecasting the future prevalence of antibiotic resistance in different
species, including different models and data types.
I conclude that there is potential for machine learning to be very impactful in this
setting. However, to maximise this impact more appropriate datasets with which to train
and evaluate predictive models of antibiotic resistance need to be collated. In the final
chapter I describe what I believe to be the key features of an appropriate data collection
protocol to support the development of these models.
bacterial diseases became treatable. However, the selection pressure created by the
widespread use of antibiotics promoted the expansion of many antibiotic resistant
bacterial lineages. Additionally, since the “golden age of antibiotic discovery” (c.1950-
1970), the rate of discovery of novel antibiotics has steeply declined. This compounds
the major public health threat posed by the phenomenon of antibiotic resistance.
To limit the further dissemination of antibiotic resistant bacteria, and improve patient
health outcomes, predictive models could be developed to assist clinicians and public
health agencies in decision making. Here, I consider two distinct applications of machine
learning for predictive modelling in this field:
• Prediction of the antibiotic resistance phenotype of bacterial lineages using
genomic data.
• Forecasting of the future prevalence of antibiotic resistance from surveillance
data.
Across these two related areas I made the following contributions to the field. Firstly, I
established robust protocols for evaluating these models, and demonstrate that without
this it is difficult to fairly evaluate the performance of these models. Secondly, I present
alternative methods for designing predictive models of bacterial phenotype which
generalise accurately across populations, and investigate the utility of conditioning
these models on a graph representation of the bacterial pangenome. Thirdly, I compare
approaches for forecasting the future prevalence of antibiotic resistance in different
species, including different models and data types.
I conclude that there is potential for machine learning to be very impactful in this
setting. However, to maximise this impact more appropriate datasets with which to train
and evaluate predictive models of antibiotic resistance need to be collated. In the final
chapter I describe what I believe to be the key features of an appropriate data collection
protocol to support the development of these models.
Version
Open Access
Date Issued
2023-04
Date Awarded
2024-01
Copyright Statement
Creative Commons Attribution NonCommercial Licence
License URL
Advisor
Croucher, Nicholas
Bhatt, Samir
Wheeler, Nicole
Publisher Department
School of Public Health
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)