Consensus clustering with missing labels (ccml): a consensus clustering tool for multi-omics integrative prediction in cohorts with unequal sample coverage

Li, C-X; Chen, H; Zounemat-Kermani, N; Adcock, IM; Sköld, CM; Zhou, M; Wheelock, ÅM; U-BIOPRED study group

Altmetric

Consensus clustering with missing labels (ccml): a consensus clustering tool for multi-omics integrative prediction in cohorts with unequal sample coverage

File	Description	Size	Format
Consensus clustering with missing labels (ccml) a consensus clustering tool for multi-omics integrative prediction in cohort.pdf	Published version	799.89 kB	Adobe PDF	View/Open

Title:	Consensus clustering with missing labels (ccml): a consensus clustering tool for multi-omics integrative prediction in cohorts with unequal sample coverage
Authors:	Li, C-X Chen, H Zounemat-Kermani, N Adcock, IM Sköld, CM Zhou, M Wheelock, ÅM U-BIOPRED study group
Item Type:	Journal Article
Abstract:	Multi-omics data integration is a complex and challenging task in biomedical research. Consensus clustering, also known as meta-clustering or cluster ensembles, has become an increasingly popular downstream tool for phenotyping and endotyping using multiple omics and clinical data. However, current consensus clustering methods typically rely on ensembling clustering outputs with similar sample coverages (mathematical replicates), which may not reflect real-world data with varying sample coverages (biological replicates). To address this issue, we propose a new consensus clustering with missing labels (ccml) strategy termed ccml, an R protocol for two-step consensus clustering that can handle unequal missing labels (i.e. multiple predictive labels with different sample coverages). Initially, the regular consensus weights are adjusted (normalized) by sample coverage, then a regular consensus clustering is performed to predict the optimal final cluster. We applied the ccml method to predict molecularly distinct groups based on 9-omics integration in the Karolinska COSMIC cohort, which investigates chronic obstructive pulmonary disease, and 24-omics handprint integrative subgrouping of adult asthma patients of the U-BIOPRED cohort. We propose ccml as a downstream toolkit for multi-omics integration analysis algorithms such as Similarity Network Fusion and robust clustering of clinical data to overcome the limitations posed by missing data, which is inevitable in human cohorts consisting of multiple data modalities. The ccml tool is available in the R language (https://CRAN.R-project.org/package=ccml, https://github.com/pulmonomics-lab/ccml, or https://github.com/ZhoulabCPH/ccml).
Issue Date:	Jan-2024
Date of Acceptance:	1-Dec-2023
URI:	http://hdl.handle.net/10044/1/108947
DOI:	10.1093/bib/bbad501
ISSN:	1467-5463
Publisher:	Oxford University Press
Journal / Book Title:	Briefings in Bioinformatics
Volume:	25
Issue:	1
Copyright Statement:	© The Author(s) 2024. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
Publication Status:	Published
Conference Place:	England
Article Number:	bbad501
Online Publication Date:	2024-01-10
Appears in Collections:	National Heart and Lung Institute Faculty of Medicine

This item is licensed under a Creative Commons License