Prediction of bioconcentration factors in fish and invertebrates using machine learning
File(s)1-s2.0-S0048969718330869-main.pdf (1.5 MB)
Published version
Author(s)
Type
Journal Article
Abstract
The application of machine learning has recently gained interest from ecotoxicological fields for its ability to model and predict chemical and/or biological processes, such as the prediction of bioconcentration. However, comparison of different models and the prediction of bioconcentration in invertebrates has not been previously evaluated. A comparison of 24 linear and machine learning models is presented herein for the prediction of bioconcentration in fish and important factors that influenced accumulation identified. R2 and root mean square error (RMSE) for the test data (n = 110 cases) ranged from 0.23–0.73 and 0.34–1.20, respectively. Model performance was critically assessed with neural networks and tree-based learners showing the best performance. An optimised 4-layer multi-layer perceptron (14 descriptors) was selected for further testing. The model was applied for cross-species prediction of bioconcentration in a freshwater invertebrate, Gammarus pulex. The model for G. pulex showed good performance with R2 of 0.99 and 0.93 for the verification and test data, respectively. Important molecular descriptors determined to influence bioconcentration were molecular mass (MW), octanol-water distribution coefficient (logD), topological polar surface area (TPSA) and number of nitrogen atoms (nN) among others. Modelling of hazard criteria such as PBT, showed potential to replace the need for animal testing. However, the use of machine learning models in the regulatory context has been minimal to date and is critically discussed herein. The movement away from experimental estimations of accumulation to in silico modelling would enable rapid prioritisation of contaminants that may pose a risk to environmental health and the food chain.
Date Issued
2019-01-15
Date Acceptance
2018-08-09
Citation
Science of the Total Environment, 2019, 648, pp.80-89
ISSN
0048-9697
Publisher
Elsevier
Start Page
80
End Page
89
Journal / Book Title
Science of the Total Environment
Volume
648
Copyright Statement
© 2018 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY license (http://creativecommons.org/licenses/by/4.0/).
License URL
Identifier
https://www.sciencedirect.com/science/article/pii/S0048969718330869?via%3Dihub
Subjects
BCF
Bioconcentration
Machine learning
Modelling
PBT
Pharmaceutical
Amphipoda
Animals
Carps
Ecotoxicology
Environmental Exposure
Machine Learning
Models, Biological
Pharmaceutical Preparations
Water Pollutants, Chemical
Animals
Carps
Amphipoda
Pharmaceutical Preparations
Water Pollutants, Chemical
Environmental Exposure
Models, Biological
Ecotoxicology
Machine Learning
Environmental Sciences
Notes
10.1016/j.scitotenv.2018.08.122 The application of machine learning has recently gained interest from ecotoxicological fields for its ability to model and predict chemical and/or biological processes, such as the prediction of bioconcentration. However, comparison of different models and the prediction of bioconcentration in invertebrates has not been previously evaluated. A comparison of 24 linear and machine learning models is presented herein for the prediction of bioconcentration in fish and important factors that influenced accumulation identified. R2 and root mean square error (RMSE) for the test data (n = 110 cases) ranged from 0.23–0.73 and 0.34–1.20, respectively. Model performance was critically assessed with neural networks and tree-based learners showing the best performance. An optimised 4-layer multi-layer perceptron (14 descriptors) was selected for further testing. The model was applied for cross-species prediction of bioconcentration in a freshwater invertebrate, Gammarus pulex. The model for G. pulex showed good performance with R2 of 0.99 and 0.93 for the verification and test data, respectively. Important molecular descriptors determined to influence bioconcentration were molecular mass (MW), octanol-water distribution coefficient (logD), topological polar surface area (TPSA) and number of nitrogen atoms (nN) among others. Modelling of hazard criteria such as PBT, showed potential to replace the need for animal testing. However, the use of machine learning models in the regulatory context has been minimal to date and is critically discussed herein. The movement away from experimental estimations of accumulation to in silico modelling would enable rapid prioritisation of contaminants that may pose a risk to environmental health and the food chain.
Publication Status
Published
Date Publish Online
2018-08-10