Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • Communities & Collections
  • Research Outputs
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Engineering
  3. Chemical Engineering
  4. Chemical Engineering
  5. Machine learning framework to extract the biomarker potential of plasma IgG N-glycans towards disease risk stratification
 
  • Details
Machine learning framework to extract the biomarker potential of plasma IgG N-glycans towards disease risk stratification
File(s)
1-s2.0-S2001037024000618-main.pdf (2.91 MB)
Published version
Author(s)
Flevaris, Konstantinos
Davies, Joseph
Nakai, Shoh
Vuckovic, Frano
Lauc, Gordan
more
Type
Journal Article
Abstract
Effective management of chronic diseases and cancer can greatly benefit from disease-specific biomarkers that enable informative screening and timely diagnosis. IgG N-glycans found in human plasma have the potential to be minimally invasive disease-specific biomarkers for all stages of disease development due to their plasticity in response to various genetic and environmental stimuli. Data analysis and machine learning (ML) approaches can assist in harnessing the potential of IgG glycomics towards biomarker discovery and the development of reliable predictive tools for disease screening. This study proposes an ML-based N-glycomic analysis framework that can be employed to build, optimise, and evaluate multiple ML pipelines to stratify patients based on disease risk in an interpretable manner. To design and test this framework, a published colorectal cancer (CRC) dataset from the Study of Colorectal Cancer in Scotland (SOCCS) cohort (1999–2006) was used. In particular, among the different pipelines tested, an XGBoost-based ML pipeline, which was tuned using multi-objective optimisation, calibrated using an inductive Venn-Abers predictor (IVAP), and evaluated via a nested cross-validation (NCV) scheme, achieved a mean area under the Receiver Operating Characteristic Curve (AUC-ROC) of 0.771 when classifying between age-, and sex-matched healthy controls and CRC patients. This performance suggests the potential of using the relative abundance of IgG N-glycans to define populations at elevated CRC risk who merit investigation or surveillance. Finally, the IgG N-glycans that highly impact CRC classification decisions were identified using a global model-agnostic interpretability technique, namely Accumulated Local Effects (ALE). We envision that open-source computational frameworks, such as the one presented herein, will be useful in supporting the translation of glycan-based biomarkers into clinical applications.
Date Issued
2024-12
Date Acceptance
2024-03-09
Citation
Computational and Structural Biotechnology Journal, 2024, 23, pp.1234-1243
URI
http://hdl.handle.net/10044/1/110504
URL
https://www.sciencedirect.com/science/article/pii/S2001037024000618
DOI
https://www.dx.doi.org/10.1016/j.csbj.2024.03.008
ISSN
2001-0370
Publisher
Elsevier
Start Page
1234
End Page
1243
Journal / Book Title
Computational and Structural Biotechnology Journal
Volume
23
Copyright Statement
© 2024 The Author(s). Published by Elsevier B.V. on behalf of Research Network of Computational and Structural Biotechnology. This is an open access
article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
License URL
https://creativecommons.org/licenses/by/4.0/
Identifier
https://www.sciencedirect.com/science/article/pii/S2001037024000618
Publication Status
Published
Date Publish Online
2024-03-11
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback