Modelling the free energy of solvation: from data-driven to statistical mechanical approaches
File(s)
Author(s)
Nur Jazlan, Nur Redzuan
Type
Thesis or dissertation
Abstract
The Gibbs free energy of solvation for a given solute in a solvent, usually considered at infinite dilution, provides a simple thermodynamic description of the solution and is related to numerous solvation properties. In the context of solution chemistry, it provides a route to understanding the effect of solvents on equilibrium constants and reaction rates. In the discovery of new drugs, the effectiveness of a drug depends in part on solubility and permeability, leading to the prediction of Gibbs free energy of solvation values to be used frequently in quantitative drug design. Given the importance of the Gibbs free energy of solvation, many predictive tools were developed, spanning quantum mechanical (QM) methods, empirical methods, and classical methods. Of note, empirical methods are data-driven approaches through statistical learning.
In this work, we assembled a database of experimental Gibbs free energies of solvation and a corresponding set of 9 quantum mechanical (QM) solute descriptors and 12 bulk solvent descriptors. We also partitioned the Gibbs free energy of solvation into an electrostatic term and a nonelectrostatic term. The electrostatic term is the difference between the electronic energies of a solute in a vacuum and solvent obtained though using the X3LYP/6-31 G(d,p) electronic structure method and the Polarizable Continuum Model (PCM). We then obtain a separate database of derived nonelectrostatic energies alongside the Gibbs free energy of solvation database which are used to develop models using statistical and regression methodologies such as partial least squares (PLS), quadratic partial least squares (QPLS) and automatic learning of algebraic models for optimisation (ALAMO).
We then carry out a systematic comparison of various activity coefficients, data-driven models, an equation of state, and a hybrid QM/activity coefficient model. Notable models include the Dortmund version of UNIFAC model (modUNIFAC (Do)), the statistical associating fluid theory (SAFT- γ Mie), and the conductor-like screening model segmented activity coefficients (COSMO-SAC). We carry out calculations for the free energy of solvation on a common data set of 404 solute/solvent pairs with examples such as alcohols, alkanes, and aromatic molecules as solutes and alkanes, alcohols and water as solvents. We also assess the strengths and weaknesses of each method based on the overall data set and for specific subsets of solute/solvent pairs (e.g., aqueous/nonaqueous pairs.)
In this work, we assembled a database of experimental Gibbs free energies of solvation and a corresponding set of 9 quantum mechanical (QM) solute descriptors and 12 bulk solvent descriptors. We also partitioned the Gibbs free energy of solvation into an electrostatic term and a nonelectrostatic term. The electrostatic term is the difference between the electronic energies of a solute in a vacuum and solvent obtained though using the X3LYP/6-31 G(d,p) electronic structure method and the Polarizable Continuum Model (PCM). We then obtain a separate database of derived nonelectrostatic energies alongside the Gibbs free energy of solvation database which are used to develop models using statistical and regression methodologies such as partial least squares (PLS), quadratic partial least squares (QPLS) and automatic learning of algebraic models for optimisation (ALAMO).
We then carry out a systematic comparison of various activity coefficients, data-driven models, an equation of state, and a hybrid QM/activity coefficient model. Notable models include the Dortmund version of UNIFAC model (modUNIFAC (Do)), the statistical associating fluid theory (SAFT- γ Mie), and the conductor-like screening model segmented activity coefficients (COSMO-SAC). We carry out calculations for the free energy of solvation on a common data set of 404 solute/solvent pairs with examples such as alcohols, alkanes, and aromatic molecules as solutes and alkanes, alcohols and water as solvents. We also assess the strengths and weaknesses of each method based on the overall data set and for specific subsets of solute/solvent pairs (e.g., aqueous/nonaqueous pairs.)
Version
Open Access
Date Issued
2021-12
Date Awarded
2022-10
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Adjiman, Claire
Galindo, Amparo
Publisher Department
Chemical Engineering
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)