IRUS Total

Shrinkage methods for variable selection and prediction with applications to genetic data

File Description SizeFormat 
Cule-E-2013-PhD-Thesis.pdf845.21 kBAdobe PDFView/Open
Title: Shrinkage methods for variable selection and prediction with applications to genetic data
Authors: Cule, Erika
Item Type: Thesis or dissertation
Abstract: Identifying genotypes using genetic material was at first a painstaking laboratory task. In the decades since the first gene was sequenced, techniques have progressed through milestones requiring massive international collaboration. Today’s genotype sequencing facilities use high-throughput technology to sequence entire genomes within days. Despite these technological improvements, and the resultant volume of genetic data, the identification of meaningful genotype-phenotype associations has not been as straightforward as was anticipated in the pre-genome era. The genetic architecture of many common diseases is complex, and heritability often cannot be explained when simple statistical tests are used. This thesis addresses a clinically important problem in statistical genetics - that of predicting disease risk based on genotype information. First, we review progress and current limitations in genetic risk prediction. We then introduce penalised regression. This thesis focusses on ridge regression, a penalised regression approach that has shown promise in risk prediction for high-dimensional data. The choice of the ridge parameter, which controls the amount of penalisation in ridge regression, has not been addressed in the literature with the specific aim of analysing genetic data. We present a method for automatically choosing the ridge parameter based on genome-wide SNP data. Software implementing the method is available to the community. We evaluate the method using simulation studies and a real data example. A ridge regression model does not indicate the strength of association of individual variants with the outcome, a property that is often of interest to geneticists. To this end we extend a previously proposed test of significance in ridge regression models to high-dimensional data and to the logistic model which commonly occurs in the biomedical context. This test is evaluated by comparison to a permutation test, which we view as a benchmark. This test is integrated into the software package mentioned above.
Issue Date: Jan-2013
Date Awarded: Jul-2013
URI: http://hdl.handle.net/10044/1/12811
DOI: https://doi.org/10.25560/12811
Supervisor: De Iorio, Maria
Vineis, Paolo
Sponsor/Funder: Wellcome Trust (London, England)
Department: School of public health
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:School of Public Health PhD Theses

Unless otherwise indicated, items in Spiral are protected by copyright and are licensed under a Creative Commons Attribution NonCommercial NoDerivatives License.

Creative Commons