1
IRUS Total
Downloads
  Altmetric

Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts

Title: Predicting and elucidating the etiology of fatty liver disease: A machine learning modeling and validation study in the IMI DIRECT cohorts
Authors: Atabaki-Pasdar, N
Ohlsson, M
Vinuela, A
Frau, F
Pomares-Millan, H
Haid, M
Jones, AG
Thomas, EL
Koivula, RW
Kurbasic, A
Mutie, PM
Fitipaldi, H
Fernandez, J
Dawed, AY
Giordano, GN
Forgie, IM
McDonald, TJ
Rutters, F
Cederberg, H
Chabanova, E
Dale, M
Masi, FD
Thomas, CE
Allin, KH
Hansen, TH
Heggie, A
Hong, M-G
Elders, PJM
Kennedy, G
Kokkola, T
Pedersen, HK
Mahajan, A
McEvoy, D
Pattou, F
Raverdy, V
Haussler, RS
Sharma, S
Thomsen, HS
Vangipurapu, J
Vestergaard, H
't Hart, LM
Adamski, J
Musholt, PB
Brage, S
Brunak, S
Dermitzakis, E
Frost, G
Hansen, T
Laakso, M
Pedersen, O
Ridderstrale, M
Ruetten, H
Hattersley, AT
Walker, M
Beulens, JWJ
Mari, A
Schwenk, JM
Gupta, R
McCarthy, MI
Pearson, ER
Bell, JD
Pavo, I
Franks, PW
Item Type: Journal Article
Abstract: Background Non-alcoholic fatty liver disease (NAFLD) is highly prevalent and causes serious health complications in individuals with and without type 2 diabetes (T2D). Early diagnosis of NAFLD is important, as this can help prevent irreversible damage to the liver and, ultimately, hepatocellular carcinomas. We sought to expand etiological understanding and develop a diagnostic tool for NAFLD using machine learning. Methods and findings We utilized the baseline data from IMI DIRECT, a multicenter prospective cohort study of 3,029 European-ancestry adults recently diagnosed with T2D (n = 795) or at high risk of developing the disease (n = 2,234). Multi-omics (genetic, transcriptomic, proteomic, and metabolomic) and clinical (liver enzymes and other serological biomarkers, anthropometry, measures of beta-cell function, insulin sensitivity, and lifestyle) data comprised the key input variables. The models were trained on MRI-image-derived liver fat content (<5% or ≥5%) available for 1,514 participants. We applied LASSO (least absolute shrinkage and selection operator) to select features from the different layers of omics data and random forest analysis to develop the models. The prediction models included clinical and omics variables separately or in combination. A model including all omics and clinical variables yielded a cross-validated receiver operating characteristic area under the curve (ROCAUC) of 0.84 (95% CI 0.82, 0.86; p < 0.001), which compared with a ROCAUC of 0.82 (95% CI 0.81, 0.83; p < 0.001) for a model including 9 clinically accessible variables. The IMI DIRECT prediction models outperformed existing noninvasive NAFLD prediction tools. One limitation is that these analyses were performed in adults of European ancestry residing in northern Europe, and it is unknown how well these findings will translate to people of other ancestries and exposed to environmental risk factors that differ from those of the present cohort. Another key limitation of this study is that the prediction was done on a binary outcome of liver fat quantity (<5% or ≥5%) rather than a continuous one. Conclusions In this study, we developed several models with different combinations of clinical and omics data and identified biological features that appear to be associated with liver fat accumulation. In general, the clinical variables showed better prediction ability than the complex omics variables. However, the combination of omics and clinical variables yielded the highest accuracy. We have incorporated the developed clinical models into a web interface (see: https://www.predictliverfat.org/) and made it available to the community. Trial registration ClinicalTrials.gov NCT03814915.
Issue Date: 1-Jun-2020
Date of Acceptance: 22-May-2020
URI: http://hdl.handle.net/10044/1/82912
DOI: 10.1371/journal.pmed.1003149
ISSN: 1549-1277
Publisher: Public Library of Science (PLoS)
Start Page: 1
End Page: 27
Journal / Book Title: PLoS Medicine
Volume: 17
Issue: 6
Copyright Statement: © 2020 The Author(s). This is an open access article, free of all copyright, and may be freely reproduced, distributed, transmitted, modified, built upon, or otherwise used by anyone for any lawful purpose. The work is made available under the Creative Commons CC0 public domain dedication https://creativecommons.org/publicdomain/zero/1.0/.
Sponsor/Funder: IMI
Funder's Grant Number: 115317
Keywords: Science & Technology
Life Sciences & Biomedicine
Medicine, General & Internal
General & Internal Medicine
ALCOHOLIC STEATOHEPATITIS
INSULIN SENSITIVITY
GLOBAL EPIDEMIOLOGY
NAFLD
BIOMARKERS
Diabetes Complications
Fatty Liver
Female
Humans
Machine Learning
Male
Middle Aged
Models, Statistical
Prospective Studies
Reproducibility of Results
Risk Assessment
Humans
Fatty Liver
Diabetes Complications
Models, Statistical
Risk Assessment
Prospective Studies
Reproducibility of Results
Middle Aged
Female
Male
Machine Learning
Science & Technology
Life Sciences & Biomedicine
Medicine, General & Internal
General & Internal Medicine
ALCOHOLIC STEATOHEPATITIS
INSULIN SENSITIVITY
GLOBAL EPIDEMIOLOGY
NAFLD
BIOMARKERS
General & Internal Medicine
11 Medical and Health Sciences
Publication Status: Published
Article Number: ARTN e1003149
Online Publication Date: 2020-06-19
Appears in Collections:Department of Metabolism, Digestion and Reproduction



This item is licensed under a Creative Commons License Creative Commons