Stochastic performance modeling of distributed data processing systems
File(s)
Author(s)
Gao, Yicheng
Type
Thesis or dissertation
Abstract
Microservice architectures are widely adopted in distributed data processing systems due to their scalability, flexibility, and fault tolerance. Different architectural designs can significantly influence key performance metrics, such as response time and throughput. Therefore, developing accurate performance models during the design phase is crucial to ensure that microservice architectures can meet performance requirements while operating efficiently. However, predicting performance in such distributed systems is challenging due to the complexity of layered architectures, where data flows through multiple components, creating complex latency interplay and workload-driven variability.
To address these challenges, we introduce a series of novel stochastic performance modeling techniques tailored for microservices operating within distributed data processing systems. First, we present LN, an open-source meta-solver for layered queueing network analysis, to more accurately capture the layered interactions and dependencies in microservice architectures. Building upon this, we propose ICQ, a novel class of stochastic models that enables joint analysis of data storage access workflows, caching, and queueing contention. Our evaluation, using real-world Azure traces, shows that ICQ can improve response times by up to 35% in edge computing systems compared to baseline heuristics. To further enhance data locality in distributed microservice architectures, we design a hierarchical caching model to optimize data placement across in-memory cache nodes and provide analytical solutions for various caching policies. Simulations using real-world Alibaba traces highlight the scalability and effectiveness of this approach, demonstrating up to a 21% reduction in miss ratio and a 37% decrease in mean response time in data mesh scenarios. Lastly, we address the challenges posed by batch arrivals in serverless edge computing by developing a deep surrogate model. Extensive experiments with AI-SPRINT traces validate the accuracy of our proposed model, achieving a mean absolute percentage error of 5.33%, thus offering a scalable and efficient approach to performance prediction for serverless applications.
To address these challenges, we introduce a series of novel stochastic performance modeling techniques tailored for microservices operating within distributed data processing systems. First, we present LN, an open-source meta-solver for layered queueing network analysis, to more accurately capture the layered interactions and dependencies in microservice architectures. Building upon this, we propose ICQ, a novel class of stochastic models that enables joint analysis of data storage access workflows, caching, and queueing contention. Our evaluation, using real-world Azure traces, shows that ICQ can improve response times by up to 35% in edge computing systems compared to baseline heuristics. To further enhance data locality in distributed microservice architectures, we design a hierarchical caching model to optimize data placement across in-memory cache nodes and provide analytical solutions for various caching policies. Simulations using real-world Alibaba traces highlight the scalability and effectiveness of this approach, demonstrating up to a 21% reduction in miss ratio and a 37% decrease in mean response time in data mesh scenarios. Lastly, we address the challenges posed by batch arrivals in serverless edge computing by developing a deep surrogate model. Extensive experiments with AI-SPRINT traces validate the accuracy of our proposed model, achieving a mean absolute percentage error of 5.33%, thus offering a scalable and efficient approach to performance prediction for serverless applications.
Date Issued
2024-10-04
Date Awarded
2025-09-01
Copyright Statement
Attribution-NonCommercial 4.0 International Licence (CC BY-NC)
Advisor
Casale, Giuliano
Publisher Department
Department of Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)