Multivariate Analysis of Tumour Gene Expression Profiles Applying Regularisation and Bayesian Variable Selection Techniques

File Description SizeFormat 
Zucknick-M-2009-PhD-Thesis.pdf23.49 MBAdobe PDFView/Open
Title: Multivariate Analysis of Tumour Gene Expression Profiles Applying Regularisation and Bayesian Variable Selection Techniques
Authors: Zucknick, Manuela
Item Type: Thesis or dissertation
Abstract: High-throughput microarray technology is here to stay, e.g. in oncology for tumour classification and gene expression profiling to predict cancer pathology and clinical outcome. The global objective of this thesis is to investigate multivariate methods that are suitable for this task. After introducing the problem and the biological background, an overview of multivariate regularisation methods is given in Chapter 3 and the binary classification problem is outlined (Chapter 4). The focus of applications presented in Chapters 5 to 7 is on sparse binary classifiers that are both parsimonious and interpretable. Particular emphasis is on sparse penalised likelihood and Bayesian variable selection models, all in the context of logistic regression. The thesis concludes with a final discussion chapter. The variable selection problem is particularly challenging here, since the number of variables is much larger than the sample size, which results in an ill-conditioned problem with many equally good solutions. Thus, one open problem is the stability of gene expression profiles. In a resampling study, various characteristics including stability are compared between a variety of classifiers applied to five gene expression data sets and validated on two independent data sets. Bayesian variable selection provides an alternative to resampling for estimating the uncertainty in the selection of genes. MCMC methods are used for model space exploration, but because of the high dimensionality standard algorithms are computationally expensive and/or result in poor Markov chain mixing. A novel MCMC algorithm is presented that uses the dependence structure between input variables for finding blocks of variables to be updated together. This drastically improves mixing while keeping the computational burden acceptable. Several algorithms are compared in a simulation study. In an ovarian cancer application in Chapter 7, the best-performing MCMC algorithms are combined with parallel tempering and compared with an alternative method.
Issue Date: Dec-2008
Date Awarded: Mar-2009
URI: http://hdl.handle.net/10044/1/4397
Supervisor: Gabra, Hani
Richardson, Sylvia
Sponsor/Funder: Wellcome Trust
Author: Zucknick, Manuela
Department: Epidemiology and Public Health
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Medicine PhD theses



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Creative Commonsx