Enhanced training of response time anomaly detectors using diffusion models
File(s)main.pdf (1.56 MB)
Accepted version
Author(s)
Luo, Wenxiang
Casale, Giuliano
Type
Conference Paper
Abstract
Machine learning (ML) approaches have grown in popularity in recent years as a way to identify anomalies in microservice-based applications. However, training ML models may suffer from time constraints and limited real-world failure data availability. In this paper, we propose a method to mitigate this issue using diffusion models for training data augmentation. Response time data collected using distributed traces is used to train diffusion models leveraging a customized UNet proposed in the paper. The resulting diffusion models can then generate new data to improve the training of response time anomaly detectors. Experiments using the DeathStarBench microservices architecture demonstrate that the proposed approach increases the accuracy after training of anomaly detection models by 20%. We further show that the response time data generated by our diffusion models cannot be distinguished by classic discriminators, which confirms that the generated data are of high quality.
Date Acceptance
2025-08-05
Publisher
IEEE
Source
MASCOTS 2025
Publication Status
Accepted
Start Date
2025-10-21
Finish Date
2025-10-23
Coverage Spatial
Paris, France