Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • Communities & Collections
  • Research Outputs
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Engineering
  3. Computing
  4. Computing PhD theses
  5. Enhancing stream data processing: system optimizations and learned indexes
 
  • Details
Enhancing stream data processing: system optimizations and learned indexes
File(s)
Liang-L-2025-PhD-Thesis.pdf (6.14 MB)
Thesis
Author(s)
Liang, Liang
Type
Thesis or dissertation
Abstract
This thesis aims to optimize stream data processing, which is important for real-time data analysis and decision-making. Stream data’s inherent properties, including unbounded size, high volume, and variable velocities, impose significant challenges on processing systems. These systems must continuously evolve to meet the requirements of modern stream data processing. This thesis presents optimizations to enhance the functionality, scalability, and performance of stream data processing at both the system and algorithmic levels.

At the system level, we focus on a stream system, dispel4py, designed for scientific workload computation. We enhance the scalability and state management of dispel4py by developing dynamic allocation, dynamic auto-scaling and hybrid optimizations. Specifically, dynamic allocation allows dispel4py to scale for each task depending on the workload demands, and dynamic auto-scaling enables the entire workload to scale with fewer or more resources to maintain the performance while achieving cost efficiency. Furthermore, hybrid enables dispel4py to support stateful tasks and scaling simultaneously. Comprehensive experiments validate the scalability, portability, and performance of these three optimizations.

At the algorithm level, our focus shifts to Index-Based Window Processing (IBWP). Recently, learned indexes integrating machine learning models to enhance query performance present a promising alternative to traditional index structures. Motivated by this trend, we explore how learned indexes can effectively support search while maintaining updates for high-velocity data streams. However, the challenge lies in the inherent limitations of current updatable learned indexes. These limitations are often inherited from their traditional tree-based structures, which are cumbersome and impede update performance. To overcome these limitations, we pioneered the use of innovative queue-style flat structures, which significantly enhance update efficiency and reduce the index footprint. Based on the flat structures, we propose FLIRT and SWIX, designed for sequential IBWP and generic IBWP, respectively. Our experiments demonstrate that they effectively manage their respective IBWPs, outperforming all baselines.
Version
Open Access
Date Issued
2024-09-20
Date Awarded
01/02/2025
URI
https://hdl.handle.net/10044/1/122516
DOI
https://doi.org/10.25560/122516
License URL
https://creativecommons.org/licenses/by-nc/4.0/
Advisor
Heinis, Thomas
Publisher Department
Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback