Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • About
  • Communities & Collections
  • Advanced Search
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Medicine
  3. Department of Medicine
  4. Department of Metabolism, Digestion and Reproduction
  5. LAVASET: An ensemble method for correlated datasets with spatial, spectral, and temporal dependencies
 
  • Details
LAVASET: An ensemble method for correlated datasets with spatial, spectral, and temporal dependencies
File(s)
btae101.pdf (1.81 MB)
Published version
Author(s)
Kasapi, Melpi
Xu, Kexin
Ebbels, Tim
O'Regan, Declan
Ware, James
more
Type
Journal Article
Abstract
Motivation: Random Forests (RFs) can deal with a large number of variables, achieve reasonable prediction scores, and yield highly interpretable feature importance values. As such, RFs are appropriate models for feature selection and further dimension reduction (DR). However, RFs are often not appropriate for correlated datasets due to their mode of selecting individual features for splitting. Addressing correlation relationships in high dimensional datasets is imperative for reducing the number of variables that are assigned high importance, hence making the DR most efficient. Here, we propose the LAtent VAriable Stochastic Ensemble of Trees (LAVASET) method that derives latent variables based on the distance characteristics of each feature and aims to incorporate the correlation factor in the splitting step.
Results: Without compromising on performance in the majority of examples, LAVASET outperforms RF by accurately determining feature importance across all correlated variables and ensuring proper distribution of importance values. LAVASET yields mostly non-inferior prediction accuracies to traditional RFs when tested in simulated and real 1D datasets, as well as more complex and high-dimensional 3D datatypes. Unlike traditional RFs, LAVASET is unaffected by single `important' noisy features (false positives), as it considers the local neighbourhood. LAVASET, therefore, highlights neighbourhoods of features, reflecting real signals that collectively impact the model's predictive ability.
Availability: LAVASET is freely available as a standalone package from https://github.com/melkasapi/LAVASET.
Editor(s)
Wren, Jonathan
Date Issued
2024-03
Date Acceptance
2024-02-20
Citation
Bioinformatics, 2024, 40 (3), pp.1-9
URI
http://hdl.handle.net/10044/1/110224
URL
https://academic.oup.com/bioinformatics/article/40/3/btae101/7612229
DOI
https://www.dx.doi.org/10.1093/bioinformatics/btae101
ISSN
1367-4811
Publisher
Oxford University Press
Start Page
1
End Page
9
Journal / Book Title
Bioinformatics
Volume
40
Issue
3
Copyright Statement
© The Author(s) 2024. Published by Oxford University Press. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.
License URL
http://creativecommons.org/licenses/by/4.0/
Identifier
https://academic.oup.com/bioinformatics/article/40/3/btae101/7612229
Publication Status
Published
Article Number
btae101
Date Publish Online
2024-02-21
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback