Patient-Specific Data Fusion for Cancer Stratification and Personalised Treatment

According to Cancer Research UK, cancer is a leading cause of death accounting for more than one in four of all deaths in 2011. The recent advances in experimental technologies in cancer research have resulted in the accumulation of large amounts of patient-specific datasets, which provide complementary information on the same cancer type. We introduce a versatile data fusion (integration) framework that can effectively integrate somatic mutation data, molecular interactions and drug chemical data to address three key challenges in cancer research: stratification of patients into groups having different clinical outcomes, prediction of driver genes whose mutations trigger the onset and development of cancers, and repurposing of drugs treating particular cancer patient groups. Our new framework is based on graph-regularised non-negative matrix tri-factorization, a machine learning technique for co-clustering heterogeneous datasets. We apply our framework on ovarian cancer data to simultaneously cluster patients, genes and drugs by utilising all datasets.We demonstrate superior performance of our method over the state-of-the-art method, Network-based Stratification, in identifying three patient subgroups that have significant differences in survival outcomes and that are in good agreement with other clinical data. Also, we identify potential new driver genes that we obtain by analysing the gene clusters enriched in known drivers of ovarian cancer progression. We validated the top scoring genes identified as new drivers through database search and biomedical literature curation. Finally, we identify potential candidate drugs for repurposing that could be used in treatment of the identified patient subgroups by targeting their mutated gene products. We validated a large percentage of our drug-target predictions by using other databases and through literature curation.


Introduction
Cancer is a leading cause of morbidity worldwide. It is a complex genetic disease in which the genomes of normal cells accumulate somatic mutations and other alterations that are eventually perturbing vital cellular functions. Recent advances in DNA sequencing technologies have enabled identification of somatic mutations across tumor genomes and exomes of individual patients 1,2 . These somatic mutations provide a new and rich source of data for addressing many challenges in cancer research, such as indentifying driver genes (i.e., genes whose mutations lead progression of oncogenesis), stratifying patients into biologically meaningful classes with different clinical outcomes and creating new opportunities for development of successful personalized treatment strategies 3 . Cancer is also a highly heterogeneous disease with large genetic diversity even between tumors of the same cancer type. Namely, two clinically identical tumors rarely have a large set of common mutated genes. Moreover, very few genes are frequently mutated across tumor samples. This makes the use of somatic mutations for iden-tification of driver genes, as well as for patient stratification into subtypes, much harder 1,4,5 . However, despite this genetic diversity between tumor samples, the perturbed pathways are often similar 1 . Therefore, integration of somatic mutations with other genomic data, such as with molecular networks that contain pathways, is a promising direction for addressing these problems.
Development of computational methodologies that can efficiently integrate genome-scale molecular information and address the above mentioned challenges in cancer research is of foremost importance. A majority of previous studies do not utilise data on somatic mutations, but instead, they are mainly based on mRNA expression and methylation data. Because of noisiness of these data, the patient stratification studies for cancer types often do not produce patient subgroups that agree well with any clinical or survival data 6 . Therefore, a recent study proposed the use of somatic mutation data in combination with biological networks as a new source of information for tumor stratification 5 . However, the proposed methodology cannot account for additional data types (e.g., drug data) and cannot be used for identifying novel driver genes, nor for predicting a personalised therapy. Moreover, previous data integration methods can only be used for either cancer patient stratification 5 , driver gene prediction 7 or drug repurposing 8 .
Here, we present a versatile patient-specific data integration (fusion) methodology capable of: 1) uncovering patient subgroups (stratification) with prognostic survival outcome, 2) predicting novel driver genes and 3) repurposing drugs, i.e., predicting new candidate drugs for targeting mutated gene products in individual patients and that can be used in treatment of identified patient subgroups. To our knowledge, this is the first method that can address all three challenges simultaneously. Our methodology is based on Non-negative Matrix Tri-Factorization (NMTF) technique, initially proposed for dimensionality reduction and co-clustering problems in machine learning 9 . It approximates (factorises) a high-dimensional data matrix, representing relations between two data types, as a product of three non-negative, low-dimensional matrices 9 . The clustering interpretation of low-dimensional matrices and their previously established relatedness to the k-means clustering has enabled the use of NMTF in co-clustering problems 10,11 . Recently, there has been a significant development in the use of NMTF in data fusion because of its ability to extend to any number of interrelated data types by simultaneously decomposing their relation matrices. This has provided us with a valuable framework for fusion (integration) of any number and type of interrelated heterogeneous datasets 12,13 . NMTF has demonstrated a great potential in addressing various biological problems, such as disease association prediction 12 , disease gene discovery 14 , protein-protein interaction prediction 15 and gene function prediction 16 .
We use NMTF to integrate somatic mutation profile (SMP) data of serous ovarian cancer patients from TCGA 4 with molecular networks (MNs) from BioGRID 17 and KEGG 18 , drugtarget interaction (DTI) and drug chemical similarity (DCS) data from DrugBank 19 (detailed in Sec. 2.3). We perform consensus clustering by using NMTF to simultaneously cluster patients, genes and drugs based on the evidence from all datasets. First, we stratify patients into three groups that we assess by using clinical data. We show significant difference in survival outcomes between these groups, as well as a good agreement with other clinical data. Second, from clusters of genes, we identify those enriched in known driver mutations; we postulate genes strongly related to known driver genes in these clusters as potential drivers genes, i.e., genes responsible for ovarian cancer progression. Finally, we use the matrix completion property of NMTF to predict new drug-target relations and to identify new drug candidates that could be used for repurposing and treatment of identified ovarian cancer patient groups. Furthermore, we evaluate the influence of all combinations of datasets onto the accuracy of drug-target predictions by performing a 5-fold cross validation. We shown that the highest accuracy is achieved when all datasets are taken into account, proving the utility of integrating all considered datasets (detailed in Sec. 3).

Patient-specific data fusion framework
We consider there different datasets: patients, genes and drugs. Patients and genes are related to each other by somatic mutation profiles (SMPs), constructed for n 1 patients over n 2 genes and encoded in high-dimensional relation matrix, R n1×n2

12
. Its entries are binary values, with R 12 [p][g] = 1 if gene g is found to be mutated in patient p, and zero otherwise. Genes and drugs are related to each other according to drug-target interactions (DTIs). DTIs between n 2 genes and n 3 drugs are encoded in relation matrix, R n2×n3

23
. Its entries are also binary values, with R 23 [g][d] = 1, if the product of gene g is a target of drug d and zero otherwise. See Sec. Fig. 1 for details of construction of the relation matrices and for an illustration of these datasets.

and
We use NMTF to simultaneously decompose both relation matrices into a product of three non-negative low-dimensional matrices as follows: R 12 ≈ G 1 H 12 G T 2 . and R 23 ≈ G 2 H 23 G T 3 . The low dimensional matrices can be obtained by solving the following optimisation problem: F , where F denotes Frobenius norm and J is the objective function. The non-negativity constraints imposed on G i matrices for 1 ≤ i ≤ 3 provide easier interpretation of their values in the clustering assignment. Many of the data types are characterised by additional, internal connectivity structure represented by graphs (networks). In this study, genes are connected by molecular networks (MNs), while drugs are connected based on the similarity of their chemical structures, i.e., drug chemical similarity (DCS) network (illustrated in Fig. 1). We incorporate these networks (MN and DSC) into the above objective function by adding two regularisation terms to constrain the construction of G 2 and G 3 matrices. This approach is also known also as Graph-regularized NMTF (or GNMTF) 20 . Namely, the aim is to enforce two interacting genes to belong to the same cluster (similarly with drugs) and a violation of these constrains results in penalties to our objective function. Hence, the final objective function has the following form: where, tr denotes the trace of a matrix, and L 2 and L 3 are graph Laplacians of MN and DCS networks, respectively.  The key idea of our GNMTF-based data fusion approach is in sharing low-dimensional matrix G 2 whilst simultaneously learning from (i.e., decomposing) relation matrices, R 12 and R 23 . Such decomposition accounts for collective influence of all data sets (along with molecular and chemical constraints effectively integrated within the same framework) onto the resulting clustering of patients, genes and drugs. This approach corresponds to the intermediate data fusion in which the structure of the data is preserved during the model inference. Such an approach has been shown to result in the best accuracy among all data fusion approaches 12 .
Minimisation of the objective function, J, is done by multiplicative update rules used to compute all low-dimensional matrices; under these multiplicative rules, the objective function is non-increasing 11 . The minimisation starts by randomly initialising G i matrices for 1 ≤ i ≤ 3 and then iteratively updating their values until the convergence criterion is reached. In all our runs, we use Random Acol initialisation strategy 21 and the convergence criterion is reached when |Jn+1−Jn| |Jn| < 10 −5 . The multiplicative update rules, their derivation and proof of convergence can be found in Wang et al. 11 .
Co-clustering of patients, genes and drugs. Matrices G n1×k1 1 , G n2×k2 2 and G n3×k3 3 from Equation 1 above are cluster membership indicator matrices for patients, genes and drugs, respectively; based on their entries, n 1 patients are assigned to k 1 patient clusters, n 2 genes are assigned to k 2 gene clusters and n 3 drugs are assigned to k 3 drug clusters, respectively. In particular, following the hard clustering procedure of Brunet et al. 22 , matrix G n1×k1 is the largest entry in row p. This assignment procedure results in the binary connectivity matrix for patients, C n1×n1 1 , with entry C 1 [p 1 ][p 2 ] = 1 if patients p 1 and p 2 belong to the same cluster and C 1 [p 1 ][p 2 ] = 0 otherwise. We apply this procedure for all cluster membership indicator matrices. The number of clusters (also called rank parameters) for each dataset are chosen to be k 1 n 1 , k 2 n 2 and k 3 n 3 , which provides dimensionality reduction of the relation matrices. Matrices H k1×k2 12 and H k2×k3 23 in Equation 1 above represent compressed, low-dimensional versions of R 12 and R 23 , respectively.
An important step in our methodology is estimating rank parameters, which are the numbers of clusters of patients, genes and drugs, k 1 , k 2 and k 3 , respectively. These parameters need to be known before factorisation is performed. The usual procedure for obtaining these parameters is by varying these parameters for each run and estimating cluster stability 22,23 . We take the values of parameters for which the most stable clustering is achieved. In particular, multiplicative update rules converge to a different solution in each run, depending on the random matrix initializations (i.e., initial clustering assignment given by the initial values in For example, if a clustering of patients into k 1 classes is stable, we expect small variations in the assignment to clusters from run to run. To measure this, we perform multiple factorisation runs with the same values of rank parameters. Each time, a connectivity matrix is computed (e.g., C 1 for patients); based on these, an averaged connectivity matrix (also called the consensus matrix) over all runs is computed,Ĉ 1 . If the clustering is stable, then the entries in C 1 (also referred to as the cluster association scores) will be either close to zero, or close to one. Otherwise, the entries will be scattered in the interval [0, 1]. We use the dispersion coefficient, ρ k1 (Ĉ 1 ), introduced by Kim et al. 23 , as a measure of cluster stability. The values of the dispersion coefficient range in 0 ≤ ρ k1 (Ĉ 1 ) ≤ 1, where 1 denotes a stable clustering. In our study, for each rank parameter, we perform a grid search in intervals of 1 for 1 ≤ k 1 ≤ 5, 5 ≤ k 2 ≤ 30 and 5 ≤ k 3 ≤ 30, and compute dispersion coefficients, ρ k1 (Ĉ 1 ), ρ k2 (Ĉ 2 ) and ρ k3 (Ĉ 3 ) for patients, genes and drugs, respectively. We choose the values for k 1 , k 2 and k 3 for which dispersion coefficients are of the highest values.
Matrix completion property. In addition to co-clustering of patients, genes and drugs, we model the existing and predict new drug-target interactions by using the matrix completion property of GNMTF. Namely, after obtaining low-dimensional matrices, the reconstructed drug-target matrix,R 23 ≈ G 2 H 23 G T 3 , is more complete than the initial matrix, R 23 , and it can be used for extracting new, unobserved drug-target relations and therefore, finding new drug candidates for repurposing.

Drug repurposing, patient stratification and driver gene prediction
Drug repurposing. We use the reconstructed drug-target relation matrix,R 23 , to extract new, previously, unobserved drug-gene interactions and to postulate new candidates for drug repurposing in the treatment of ovarian cancer patients. We apply a combination of row-centric and column-centric rules to extract new, strongly associated drug-gene pairs 13 . Namely, a druggene pair, (d, g), is considered to be predicted, if the estimated association score,R 23 [g][d], is greater than the mean association score of all relations of gene g, as well as greater than the mean association score of all relations of drug d.
Patient stratification. We stratify ovarian cancer patients into groups, according to the consensus matrix,Ĉ 1 . We use the approach of Brunet et al. 22 : we use the off-diagonal entries ofĈ 1 as a measure of patient similarity and apply average linkage hierarchical clustering to group patients into k 1 classes. Results and validations are shown in Sec. 3.1 below.
Cancer driver gene prediction. Similar to the patient consensus matrix, we use the gene consensus matrix,Ĉ 2 , to extract gene clusters and identify those that are enriched in mutations and known driver genes by using the standard model sampling without replacement test (i.e., hypergeometric test). In clusters that are enriched in known drivers, we identify genes that are highly associated with known driver genes based on the clustering association scores from the gene consensus matrix. We postulate that these genes are new driver genes for ovarian cancer. Results and validations are presented in Sec. 3.2 below.

Datasets, pre-processing and matrix construction
We downloaded high-grade serous ovarian cancer somatic mutation data from TCGA data portal 4 on the 2 nd of July 2015. We only consider data generated by using Illumina GAIIx platform, having the largest number of patients. Following the same procedure for data filtering as in Hofree et al. 5 , we retain only the patients with more that 10 mutated genes. This results in n 1 = 353 serous ovarian cancer patients with mutations in the total of 11,148 genes. Mutated genes are mapped onto the Molecular Network (MN) that we obtain by merging three different biological networks: protein-protein interaction (PPI) and genetic interaction (GI) network from BioGRID database (version 3.4.126) 17 , and metabolic interaction (MI) network from KEGG database 18 . This results in MN of 236,751 interactions among n 2 = 19, 118 genes (mutated and normal). We represent these interactions by Laplacian matrix, L n2×n2 2 , computed as: L 2 = D 2 − A 2 , where A 2 is the adjacency matrix of MN and D 2 is the diagonal degree matrix of MN (i.e., whose entries on the diagonal are row sums of A 2 and all other entries in D 2 are zeros). For each patient, we create an n 2 -long binary (0, 1) somatic mutation profile (SMP) vector, where "1" indicates the existence of a mutated gene in the patient and all other entries are "0". These mutation profiles for all n 1 patients are captured in a binary relation matrix R n1×n2 12 consisting of these SMP vectors. Due to the sparsity of matrix R 12 , we apply a network propagation technique as the pre-processing step to smooth the patient profiles, by spreading the influence of each mutation over its neighbours in MN network. We use the network propagation method proposed by Vanunu et al. 24 , based on which the new patientgene relation matrix is computed iteratively as follows: R t+1 12 , whereĀ 2 is the normalised adjacency matrix of MN network computed asĀ 2 = A 2 D −1 2 , R 0 12 = R 12 is the initial patient-gene matrix and α is a tuning parameter that controls the distance of diffusion through MN network. In all our runs, we set α = 0.6 (as it produced the best results), and we took the final network-smoothed, patient-gene matrix (after convergence, |R t+1 12 − R t 12 | < 10 −6 , is achieved) as input to GNMTF. This pre-processing step has been shown to lead to much better and more robust patient stratification results in previous studies 5 , hence we use it as well.

23
. SMILES chemical representation of the n 3 drugs are also retrieved from DrugBank database. The two-dimensional chemical similarity between drugs are computed by using Tanimoto similarity coefficient 25 . We retain only the top 5% most similar drug pairs, which results in 1,069,393 links in the drug chemical similarity (DCS) network. We represent these links by Laplacian matrix, L n3×n3 3 (computed in the same way as for MN network, described above).

Clinical and biological validation of results
For all patients, we also downloaded clinical follow up data from TCGA database, including the overall patients' survival information (days to the last follow-up and vital status), age, tumor grade, size and tumor position. We used these data to assess the clinical relevance of the patient clusters that we obtain after data fusion. We used Kaplan-Meier survival curves, as well as the log-rank p-value, to measure the significance of the difference in survival profiles between different patient clusters. The log-rank p-value measures the probability of the null hypothesis that patients in each cluster are drawn from the same underlying survival distribution 26 . From TCGA database, we also retrieved a list of 83 known ovarian cancer driver genes, out of which 76 are present in our set of mutated patient genes. We use this set of genes to assess gene clusters obtained after fusion and to identify clusters enriched in drivers.
To assess the prognostic capabilities of our patient-specific data fusion approach on ovarian cancer patients, we perform clinical validation of the three obtained patient clusters. The Kaplan-Meier survival curves, shown in Fig.2 (A), reveal the low-survival group (Cluster 2) with 56% of death cases and the good outcome group (Cluster 1) with 38% of death cases. We observe that the identified clusters are highly discriminative with the log-rank p-value of 5.3 × 10 −3 . The same number of clusters has been also reported in previous studies done on somatic mutation and molecular interaction data 5 , and also in study done only on miRNA expression data 4 . Furthermore, the identified clusters display a good agreement with the median age of patients in clusters, with Cluster 2 having the oldest patients. In addition, Cluster 2 has the largest number of patients with abnormal growth of tissue (tumor), 78%, as compared to Cluster 1 with 60% of such patients.
We compare the performance of our method with the state-of-the-art somatic mutationbased stratification method called Network-based Stratification (NBS) 5  patient-gene post-smoothing relation matrix and a molecular network matrix. We apply it on the same set of data described in Section 2.3, excluding drug data, which only our framework can take into account. We test NBS for different numbers of patient clusters (i.e., k ∈ {2, 3, 4, 5}) and compute the Kaplan-Meier survival curves 26 for the obtained patient clusters. We compare the survivability results of NBS with our method with the same number of patient clusters (Fig. 2 (A,B)). Unlike our method, which can produce clusters with significantly different survival outcomes (i.e., p-val = 5.3 ×10 −3 ), NBS cannot (p-val ≥ 0.74 for all k ∈ {2, 3, 4, 5}). Thus, our framework is the only one able to extract personalised knowledge from somatic mutation profiles.

Identification of driver genes
We performed biological assessment of the k 2 = 25 gene clusters that we obtain from the gene consensus matrix,Ĉ 2 . We identify 9 gene clusters that are significantly enriched in mutations and 5 gene clusters that are significantly enriched in known drivers (p − val ≤ 0.05, see Fig. 3). Out of these clusters, cluster number 8 has the largest number of driver genes (26) and the highest enrichment in driver genes (with p − val = 2.06 × 10 −4 ). To identify new driver genes, we further analyse this cluster as follows: first, based on the cluster association scores in the gene consensus matrix, we extract the mutated genes that are strongly associated with the known driver genes. In particular, we focus only on genes associated with the known driver genes with the cluster association score ≥ 0.9 (as explained below). That is, out of 20 restarts of GNMTF, we extract genes that appear 18 times in cluster 8 with other driver genes. Then, for each of these genes, we compute the average cluster score based on its associations with all driver genes. We provide the list of the top 20 genes (out of 809 predicted drivers in total) that we postulate as new driver genes of ovarian cancer progression and we sort it according to the average cluster association score, as shown in Table 1. This procedure is motivated by the observation that out of the 76 known driver genes, 67 of them are strongly related (with cluster association score ≥ 0.9) among themselves through all gene clusters.
We assess our predicted driver genes against two cancer driver gene databases, COSMIC database Cancer Gene Census 29 and IntOGen 30 , as well as against a database of putative cancer driver genes, the Candidate Cancer Gene Database (CCGD) 31 . Our results show that ∼ 40% of our 809 predicted driver genes (with scores ≥ 0.9) have either been already proposed as drivers (in CCGD), or validated by experts (Census, or IntOGen). The list of our 20 top-scoring predicted cancer driver genes is presented in Table 1. Also, we investigated the literature to assess the relevance of our two top-scoring predictions that are not found in other databases and found evidence that they are biologically relevant. Our top-scoring cancer driver gene prediction is ADAM32, which is strongly clustered with driver gene BMPR2 ( Table 1). The association between the two genes is biologically relevant, because both are involved with transforming growth factors (TGFs). Our prediction of ADAM32 as a cancer driver gene is also relevant, because ADAM genes are known to be responsible for cancer cell proliferation and progression 32 . The second best prediction is REG1P (from the REG family of proteins), which is strongly clustered with driver gene CLASP2. Our prediction of REG1P as a cancer driver gene is also relevant, because the REG family plays different roles in proliferation, migration, and anti-apoptosis through activating different signalling pathways; their dis-regulation is closely associated with cancer and REG proteins have been proposed as markers for prognosis of cancers 33 .

Drug-target interaction prediction
To demonstrate the predictive power of our data fusion approach and to assess the contribution of each dataset on the drug-target interaction prediction, we perform a 5-fold cross validation for each combination of the datasets shown in Fig. 1. In all our experiments, true positives are correctly predicted DTIs, while false positives are predicted DTIs that are not present in the initial dataset.
We compute average Area Under the Receiver Operator Characteristic (ROC) and Precision-Recall (PR) curves (over 20 repetitions) to evaluate the performance of our methodology for each combination of datasets included in the integration process. The results are shown in Fig. 4. The lowest values of average AUC ROC and AUC PR are observed when only DTI dataset is used. The values increase with the inclusion of other datasets, resulting in the highest average AUC ROC when all datasets are taken into account. With all datasets taken into account by GNMTF, we use the reconstructed DTI relation matrix,R 23 , to extract new drug-target interactions, as described in Sec. 2.2. We assess our prediction accuracy against two different large drug-target interaction databases, MATADOR 28 and CTD 27 . Out Table 1. The list of the top scoring proposed driver genes (1 st column) and their associated known driver genes (2 nd column), with the association score (3 rd column), and the confirmation of their presence in CCGD database (4 th column).   Table 2, out of which 17 are confirmed in CTD, or MATADOR database. Second, we investigated the literature to assess the relevance of our two top-scoring predicted DTIs that are not found in other databases and found evidences that they are biologically relevant. The top scoring target gene KIT (C-Kit) is particularly relevant. It is a receptor tyrosine kinase (e.g., it catalyses ATP/ADP reactions). It has been shown that unregulated activity of this gene leads to occurrence of tumors and thus, it has been proposed as a potential drug target in cancer 34 . Interestingly, we predict the drug candidate for targeting this gene to be Adenosine triphosphate (or ATP), for which a precise role in cancer is still under investigation. Increasing ATP intake is known to improve cancer patient conditions 35 . The reason could be that ATP is linked to cancer cell metabolism and either activates cell death mediated by restoration of normal mitochondrial function, or alterates the cytosolic ATP/ADP ratio, which is postulated to deactivate glycolysis (Warburg effect) in a cancer cell 36 . Another drug-target in Table 2 whose predicted drug is not present in CTD and MATADOR databases is GRIN3A. GRIN3A (NMDAR-l) is a sub-unit of NMDA receptor (a glutamate-regulated ion channel). NMDA receptor has been proposed as a target for cancer chemotherapy 37 . It has been proposed that glutamate antagonist molecules should be used as potential drug targets 37 . Interestingly, our predicted drug, Pethidine (also known as Meperidine), is a glutamate antagonist that is known to bind NMDA receptors 38 , which provides evidence that our prediction of Pethidine as a drug for targeting GRIN3A is biologically relevant. However, evidence that Pethidine can bind to GRIN3A in particular has not yet been established. Furthermore, based on the mutated genes of particular patients, we propose these newly discovered drugs (see column four in Table 2) for treatment of the three patient groups described in Sec. 3.1.

Conclusions
In this paper, we propose a data fusion framework that can effectively integrate somatic mutation data along with molecular networks and drug chemical data. It is based on GNMTF method for co-clustering heterogeneous data and it can be even further extended to incorporate any number and type of data. One important advantage of our framework is that when applied to a specific cancer, it can simultaneously perform three different tasks: patient stratification into clinically different groups, novel driver gene identification and drug-repurposing predictions for treating cancer. We apply the GNMTF-based data fusion framework to ovarian cancer patients and identify three substantially different groups of patients with different survival outcomes. In addition, from the obtained gene clusters, we identify a list of genes that we postulate as potential drivers of ovarian cancer progression due to their strong cluster associations to known ovarian cancer driver genes. We perform biomedical literature curation for the top scoring predictions, ADAM32 and REG1P, and show that they are related to cancer cell proliferation and tumor progression, while 40% of other predictions we validate in other databases. Moreover, our framework is capable of predicting new drugs that could be used for targeting mutated genes and thus, for treatment of identified groups of ovarian cancer patients. We provide a list of predicted drug-target interactions, a good number of which is matching those reported in other databases. Other, non-validated predictions for driver genes and drug-target interactions could be true, awaiting experimental validation.
Our analysis also suggests that somatic mutation data is a valuable complement to other molecular data, whose integration with those data could lead to an improvement in the performance of data fusion methods. Our approach has a potential to enable the derivation of new hypotheses, improve drug selection and lead to improvement in patient genomics-tailored therapeutics for cancer.
Education and Science Project III44006.