Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • Communities & Collections
  • Research Outputs
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Engineering
  3. Computing
  4. Computing PhD theses
  5. Methods for efficient storage, distribution and scalable analysis of spatial data
 
  • Details
Methods for efficient storage, distribution and scalable analysis of spatial data
File(s)
Evagorou-G-2021-PhD-Thesis.pdf (2.16 MB)
Thesis
Author(s)
Evagorou, Giannis
Type
Thesis or dissertation
Abstract
Spatial data management has been the focus of the database research community for over three decades. The research community has proposed a wealth of techniques to efficiently store and process spatial data but new approaches are becoming increasingly crucial, not only because of the growing number of applications with spatial data at their core, but also because of the different kinds of spatial data (i.e. data diversity) and the various specialized queries (i.e. query diversity) performed on them. Data and query diversity makes it especially challenging to develop an indexing method that can accommodate every possible application of spatial data. In the following, we explore how spatial data manifests in different forms in various applications and how each application requires different queries to facilitate insight extraction.


For instance, in mechanical engineering, product designers use computer aided design (CAD) tools to virtually engineer a product before building the actual physical prototype. In these applications, geometric data (characterizing, for example, engine parts) are stored in CAD files on a file system and, in big product development -- e.g. plane construction -- require terabytes of secondary storage. Nevertheless, engineers require efficient access to these data in order to execute queries that will ultimately reduce development cost and time. Example operations include nearest neighbour queries such as finding all neighbouring parts of a disk brake or queries used to cope with late engineering changes such as fit issues with car parts.


To deliver more effective diagnosis and treatment of different diseases, medical imaging requires the analysis of high resolution images of human tissue samples. Scientists use high resolution scanned images to extract small anatomic objects (e.g. blood vessels, cells or nuclei) and their features and perform spatial operations between them. Ultimately, these anatomic objects are spatial objects and hence spatial proximity queries between them can be executed. This, for example, enables the discovery of abnormal regions that are closest to a stem cell.

Likewise, scientists from the Blue Brain project build biologically accurate brain models of mammals and perform spatial operations to improve their understanding of the mammalian brain. Brain models are accurately represented as spatial data and include objects such as neocortices and their constituent neurons. These models are characterized by an unusual high density -- neurons are tightly packed in a very small volume of space.


Clearly, indexing methods need to provide efficient access to datasets that exhibit dissimilar properties, from sparse datasets with poor space coverage to high density datasets, and, from CAD files to medical images. Arguably, an one-size-fits-all index is not plausible or realistic. This diversity is not the only challenge and despite the tremendous amount of research on spatial data there exist multiple open challenges still to be addressed. In this thesis, we propose indexing methods and algorithms to address many of these challenges.

More precisely, we optimize the execution of spatial queries and where necessary we introduce novel indexing methods that organize data and facilitate the efficient execution of queries. We additionally invent algorithms that work in conjunction with these indexing methods and aim to improving querying performance. To cope with the unprecedented collection and production rates of spatial data, we propose partitioning methods to distribute spatial data across clusters. As with indexing methods, one partitioning method that works in every case is not feasible. We thus propose methods for partitioning data according to query type, data distribution and efficiency or load balancing goals. We show the efficiency of our methods across real and synthetic datasets and, where necessary, we introduce models to fine tune them for optimal performance.
Version
Open Access
Date Issued
2021-03
Date Awarded
2021-12
URI
http://hdl.handle.net/10044/1/97987
DOI
https://doi.org/10.25560/97987
Copyright Statement
Creative Commons Attribution NonCommercial Licence
License URL
http://creativecommons.org/licenses/by-nc/4.0/
Advisor
Heinis, Thomas
Publisher Department
Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback