A novel algorithm for simultaneous SNP selection in high-dimensional genome-wide association studies
File(s)1471-2105-13-284.pdf (376.93 KB)
Published version
Author(s)
Zuber, Verena
Pedro Duarte Silva, A
Strimmer, Korbinian
Type
Journal Article
Abstract
Background
Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that needs to be taken into account. Hence, increasingly modern computationally expensive regression methods are employed for SNP selection that consider all markers simultaneously and thus incorporate dependencies among SNPs.
Results
We develop a novel multivariate algorithm for large scale SNP selection using CAR score regression, a promising new approach for prioritizing biomarkers. Specifically, we propose a computationally efficient procedure for shrinkage estimation of CAR scores from high-dimensional data. Subsequently, we conduct a comprehensive comparison study including five advanced regression approaches (boosting, lasso, NEG, MCP, and CAR score) and a univariate approach (marginal correlation) to determine the effectiveness in finding true causal SNPs.
Conclusions
Simultaneous SNP selection is a challenging task. We demonstrate that our CAR score-based algorithm consistently outperforms all competing approaches, both uni- and multivariate, in terms of correctly recovered causal SNPs and SNP ranking. An R package implementing the approach as well as R code to reproduce the complete study presented here is available fromhttp://strimmerlab.org/software/care/.
Identification of causal SNPs in most genome wide association studies relies on approaches that consider each SNP individually. However, there is a strong correlation structure among SNPs that needs to be taken into account. Hence, increasingly modern computationally expensive regression methods are employed for SNP selection that consider all markers simultaneously and thus incorporate dependencies among SNPs.
Results
We develop a novel multivariate algorithm for large scale SNP selection using CAR score regression, a promising new approach for prioritizing biomarkers. Specifically, we propose a computationally efficient procedure for shrinkage estimation of CAR scores from high-dimensional data. Subsequently, we conduct a comprehensive comparison study including five advanced regression approaches (boosting, lasso, NEG, MCP, and CAR score) and a univariate approach (marginal correlation) to determine the effectiveness in finding true causal SNPs.
Conclusions
Simultaneous SNP selection is a challenging task. We demonstrate that our CAR score-based algorithm consistently outperforms all competing approaches, both uni- and multivariate, in terms of correctly recovered causal SNPs and SNP ranking. An R package implementing the approach as well as R code to reproduce the complete study presented here is available fromhttp://strimmerlab.org/software/care/.
Date Issued
2012-10-31
Date Acceptance
2012-10-20
Citation
BMC Bioinformatics, 2012, 13, pp.1-8
ISSN
1471-2105
Publisher
BioMed Central
Start Page
1
End Page
8
Journal / Book Title
BMC Bioinformatics
Volume
13
Copyright Statement
© 2012 Zuber et al.; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and
reproduction in any medium, provided the original work is properly cited.
Identifier
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000314180800001&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
Subjects
Science & Technology
Life Sciences & Biomedicine
Biochemical Research Methods
Biotechnology & Applied Microbiology
Mathematical & Computational Biology
Biochemistry & Molecular Biology
VARIABLE SELECTION
REGRESSION
DISCOVERY
REGULARIZATION
SHRINKAGE
GENE
RARE
Publication Status
Published
Article Number
ARTN 284
Date Publish Online
2012-10-31