Development of artificial intelligence models to classify pulmonary nodules and improve lung cancer early diagnosis
File(s)
Author(s)
Hunter, Benjamin
Type
Thesis or dissertation
Abstract
The aims of this thesis were to develop machine-learning models to identify lung nodules,
predict the risk of cancer and provide clinical decision support.
A structured-query language model was developed at The Royal Marsden Hospital to generate
a database of 14,586 patients with lung nodules. Lung (39%), neuro-endocrine (38%) and skin
(35%) cancers were most commonly associated with nodules. Nodules patients had more
metastatic diagnoses (45% vs 23%, p < 0.001) and a higher mean scan number (6.56 vs 1.93,
p < 0.001) at shorter intervals (4.1 vs 5.9 months, p < 0.001). The model was externally
validated with high performance (Krippendorf’s Alpha > 0.98).
Scans from the LUCADI and LIBRA studies were used to develop small (< 15mm) and large
(> 15mm) nodule radiomics predictive vectors (SN and LN-RPV respectively). Features were
extracted using TexLab 2.0, and models were developing using LASSO logistic regression.
The SN-RPV had an AUC of 0.78 in the test (95% C.I. 0.70-0.86) and external test (95% C.I.
0.71-0.83) sets. For the two-feature LN-RPV, the test set AUC was 0.87 (95% C.I. 0.80-0.93),
compared to 0.67 (95% CI 0.55–0.76, DeLong p = 0.002) for the Brock score and 0.83 (95%
CI 0.75–0.90, DeLong p = 0.4) for the Herder score. The external test set AUC was 0.75 (95%
CI 0.63–0.85). The developed decision-support tool identified 18/22 (82%) malignant nodules
in the Herder 10-70% category, and may have led to earlier investigation.
Finally, a model was developed to predict nodule spiculation in the LIBRA and NSCLC
Radiogenomics studies. The test set AUC for the 7 feature model was 0.90 (95% CI: 0.82-
Introduction
5
0.96), and spiculation was associated with worse overall survival (HR 2.0, 95% C.I. 1.00 -
4.01, p = 0.04), the differential expression of 11 genes and suppression of inflammation.
predict the risk of cancer and provide clinical decision support.
A structured-query language model was developed at The Royal Marsden Hospital to generate
a database of 14,586 patients with lung nodules. Lung (39%), neuro-endocrine (38%) and skin
(35%) cancers were most commonly associated with nodules. Nodules patients had more
metastatic diagnoses (45% vs 23%, p < 0.001) and a higher mean scan number (6.56 vs 1.93,
p < 0.001) at shorter intervals (4.1 vs 5.9 months, p < 0.001). The model was externally
validated with high performance (Krippendorf’s Alpha > 0.98).
Scans from the LUCADI and LIBRA studies were used to develop small (< 15mm) and large
(> 15mm) nodule radiomics predictive vectors (SN and LN-RPV respectively). Features were
extracted using TexLab 2.0, and models were developing using LASSO logistic regression.
The SN-RPV had an AUC of 0.78 in the test (95% C.I. 0.70-0.86) and external test (95% C.I.
0.71-0.83) sets. For the two-feature LN-RPV, the test set AUC was 0.87 (95% C.I. 0.80-0.93),
compared to 0.67 (95% CI 0.55–0.76, DeLong p = 0.002) for the Brock score and 0.83 (95%
CI 0.75–0.90, DeLong p = 0.4) for the Herder score. The external test set AUC was 0.75 (95%
CI 0.63–0.85). The developed decision-support tool identified 18/22 (82%) malignant nodules
in the Herder 10-70% category, and may have led to earlier investigation.
Finally, a model was developed to predict nodule spiculation in the LIBRA and NSCLC
Radiogenomics studies. The test set AUC for the 7 feature model was 0.90 (95% CI: 0.82-
Introduction
5
0.96), and spiculation was associated with worse overall survival (HR 2.0, 95% C.I. 1.00 -
4.01, p = 0.04), the differential expression of 11 genes and suppression of inflammation.
Version
Open Access
Date Issued
2023-02
Date Awarded
2023-08
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Aboagye, Eric
Lee, Richard
Blackledge, Matthew
Sponsor
Cancer Research UK
RM Partners
Royal Marsden Cancer Charity
National Institute for Health Research (Great Britain)
Grant Number
C309/A31316
Publisher Department
Department of Surgery & Cancer
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)