188
IRUS Total
Downloads

Exploring data mining for hydrological modelling

File Description SizeFormat 
Vitolo-C-2015-PhD-Thesis.pdfThesis9.56 MBAdobe PDFView/Open
Title: Exploring data mining for hydrological modelling
Authors: Vitolo, Claudia
Item Type: Thesis or dissertation
Abstract: Technological advances in computer science, namely cloud computing and data mining, are reshaping the way the world looks at data. Data are becoming the drivers of discoveries and strategic developments. In environmental sciences, for instance, big volumes of information are produced by monitoring networks, satellites and model simulations and are processed to uncover hidden patterns, correlations and trends to, ultimately, support policy and decision making. Hydrologists, in particular, use models to simulate river discharges and estimate the concentration of pollutants as well as the risk of floods and droughts. The very first step of any hydrological modelling exercise consists of selecting an appropriate model. However, the choice is often made by the modeller based on his/her expertise rather than on the model's suitability to reproduce the most important processes for the area under study. Since this approach defeats the ``scientific method'' for its lack of reproducibility and consistency across experts as well as locations, a shift towards a data-driven selection process is deemed necessary. This work presents the design, development and testing results of a completely novel data mining algorithm, called AMCA, able to automatically identify the most suitable model configurations for a given catchment, using minimum data requirements and an inventory of model structures. In the design phase a transdisciplinary approach was adopted, borrowing techniques from the fields of machine learning, signal processing and marketing. The algorithm was tested on the Severn at Plynlimon flume catchment, in the Plynlimon study area (Wales, UK). This area was selected because of its reliable measurements and the homogeneity of its soils and vegetation. The Framework for Understanding Structural Errors (FUSE) was used as sample model inventory, but the methodology can easily be adapted to others, including more sophisticated model structures. The model configuration problem, that the AMCA attempts to solve, can be categorised as ``fully unsupervised'' if there is no prior knowledge of interactions and relationships amongst observed data at a certain location and available model structures and parameters. Therefore, the first set of tests was run on a synthetic dataset to evaluate the algorithm's performance against known outcomes. Most of the component of the synthetic model structure were clearly identified by the AMCA, which allowed to proceed with further testing using observed data. Using real observations, the AMCA efficiently selected the most suitable model structures and, when coupled with association rule mining techniques, could also identify optimal parameter ranges. The performance of the ensemble suggested by the combination of AMCA and association rules was calibrated and validated against four widely used models (Topmodel, ARNOVIC, PRMS and Sacramento). The ensemble configuration always returned the best average efficiency, characterised by the narrowest spread and, therefore, lowest uncertainty. As final application, the full set of FUSE models was used to predict the effect of land use changes on catchment flows. The predictive uncertainty improved significantly when the prior distributions of model structures and parameters were conditioned using the AMCA approach. It was also noticed that such improvement is due to constrains applied to both model and parameter space, however the parameter space seems to contribute more. These results confirm that a considerable part of the uncertainty in prediction is due to the definition of the prior choice of the model configuration and that more objective ways to constrain the prior using formal data-driven techniques are needed. AMCA is, however, a procedure that can only be applied to gauged catchment. Future experiments could test whether AMCA configurations could be regionalised or transferred to ungauged catchments on the basis of catchment characteristics.
Content Version: Open Access
Issue Date: May-2015
Date Awarded: Feb-2016
URI: http://hdl.handle.net/10044/1/30773
DOI: https://doi.org/10.25560/30773
Supervisor: Buytaert, Wouter
Onof, Christian
Department: Civil and Environmental Engineering
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Civil and Environmental Engineering PhD theses



Unless otherwise indicated, items in Spiral are protected by copyright and are licensed under a Creative Commons Attribution NonCommercial NoDerivatives License.

Creative Commons