Conformal prediction of molecule-induced cancer cell growth inhibition challenged by strong distribution shifts
File(s)1-s2.0-S0031320325011641-main.pdf (6.42 MB)
Published version
Author(s)
Hernandez-Hernandez, Saiveth
Guo, Qianrong
Ballester, Pedro
Type
Journal Article
Abstract
The drug discovery process often employs phenotypic and target-based virtual screening to identify potential drug candidates. Despite the longstanding dominance of target-based approaches, phenotypic virtual screening is undergoing a resurgence due to its potential being now better understood. In the context of cancer cell lines, a well-established experimental system for phenotypic screens, molecules are tested to identify their whole-cell activity, as summarized by their half-maximal inhibitory concentrations. Machine learning has emerged as a potent tool for computationally guiding such screens, yet important research gaps persist, including generalization and uncertainty quantification. To address this, we leverage a clustering-based validation approach, called Leave Dissimilar Molecules Out (LDMO). This strategy enables a more rigorous assessment of model generalization to structurally novel compounds. This study focuses on applying Conformal Prediction (CP), a model-agnostic framework, to predict the activities of novel molecules on specific cancer cell lines. A total of 4320 independent models were evaluated across 60 cell lines, 5 CP variants, 2 set features, and training-test splits, providing strong and consistent results. From this comprehensive evaluation, we concluded that, regardless of the cell line or model, novel molecules with smaller CP-calculated confidence intervals tend to have smaller predicted errors once measured activities are revealed. It was also possible to anticipate the activities of dissimilar test molecules across 50 or more cell lines. These outcomes demonstrate the robust efficacy that LDMO-based models can achieve in realistic and challenging scenarios, thereby providing valuable insights for enhancing decision-making processes in drug discovery.
Date Issued
2026-04-01
Date Acceptance
2025-09-23
Citation
Pattern Recognition, 2026, 172 (Part B)
ISSN
0031-3203
Publisher
Elsevier
Journal / Book Title
Pattern Recognition
Volume
172
Issue
Part B
Copyright Statement
© 2025 The Author(s). Published by Elsevier Ltd. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
License URL
Publication Status
Published
Article Number
112501
Date Publish Online
2025-09-24