A framework for generalized subspace pattern mining in high-dimensional datasets

File Description SizeFormat 
A framework for generalized subspace pattern mining in high-dimensional datasets.pdfPublished version1.07 MBAdobe PDFDownload
Title: A framework for generalized subspace pattern mining in high-dimensional datasets
Author(s): Curry, EWJ
Item Type: Journal Article
Abstract: Background A generalized notion of biclustering involves the identification of patterns across subspaces within a data matrix. This approach is particularly well-suited to analysis of heterogeneous molecular biology datasets, such as those collected from populations of cancer patients. Different definitions of biclusters will offer different opportunities to discover information from datasets, making it pertinent to tailor the desired patterns to the intended application. This paper introduces ‘GABi’, a customizable framework for subspace pattern mining suited to large heterogeneous datasets. Most existing biclustering algorithms discover biclusters of only a few distinct structures. However, by enabling definition of arbitrary bicluster models, the GABi framework enables the application of biclustering to tasks for which no existing algorithm could be used. Results First, a series of artificial datasets were constructed to represent three clearly distinct scenarios for applying biclustering. With a bicluster model created for each distinct scenario, GABi is shown to recover the correct solutions more effectively than a panel of alternative approaches, where the bicluster model may not reflect the structure of the desired solution. Secondly, the GABi framework is used to integrate clinical outcome data with an ovarian cancer DNA methylation dataset, leading to the discovery that widespread dysregulation of DNA methylation associates with poor patient prognosis, a result that has not previously been reported. This illustrates a further benefit of the flexible bicluster definition of GABi, which is that it enables incorporation of multiple sources of data, with each data source treated in a specific manner, leading to a means of intelligent integrated subspace pattern mining across multiple datasets. Conclusions The GABi framework enables discovery of biologically relevant patterns of any specified structure from large collections of genomic data. An R implementation of the GABi framework is available through CRAN (http://cran.r-project.org/web/packages/GABi/index.html).
Publication Date: 21-Nov-2014
Date of Acceptance: 22-Oct-2014
URI: http://hdl.handle.net/10044/1/51155
DOI: https://dx.doi.org/10.1186/s12859-014-0355-5
ISSN: 1471-2105
Publisher: BioMed Central
Journal / Book Title: BMC Bioinformatics
Volume: 15
Copyright Statement: © Curry; licensee BioMed Central Ltd. 2014 This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly credited. The Creative Commons Public Domain Dedication waiver (http://creativecommons.org/publicdomain/zero/1.0/) applies to the data made available in this article, unless otherwise stated.
Sponsor/Funder: Ovarian Cancer Action
Imperial College Healthcare NHS Trust- BRC Funding
Funder's Grant Number: N/A
RDB01 79560
Keywords: Science & Technology
Life Sciences & Biomedicine
Biochemical Research Methods
Biotechnology & Applied Microbiology
Mathematical & Computational Biology
Biochemistry & Molecular Biology
GENE-EXPRESSION DATA
OVARIAN-CANCER
BICLUSTERING ALGORITHMS
BREAST-TUMORS
MANAGEMENT
SUBGROUPS
REVEALS
Algorithms
Cluster Analysis
DNA Methylation
Female
Gene Expression Profiling
Genome-Wide Association Study
Humans
Ovarian Neoplasms
Software
Humans
Ovarian Neoplasms
Cluster Analysis
Gene Expression Profiling
DNA Methylation
Algorithms
Software
Female
Genome-Wide Association Study
Science & Technology
Life Sciences & Biomedicine
Biochemical Research Methods
Biotechnology & Applied Microbiology
Mathematical & Computational Biology
Biochemistry & Molecular Biology
GENE-EXPRESSION DATA
OVARIAN-CANCER
BICLUSTERING ALGORITHMS
BREAST-TUMORS
MANAGEMENT
SUBGROUPS
REVEALS
06 Biological Sciences
08 Information And Computing Sciences
01 Mathematical Sciences
Bioinformatics
Publication Status: Published
Article Number: ARTN 355
Appears in Collections:Division of Surgery
Division of Cancer
Faculty of Medicine



Items in Spiral are protected by copyright, with all rights reserved, unless otherwise indicated.

Creative Commons