Online collection and forecasting of resource utilization in large-scale distributed systems
File(s)Cluster_Prediction_ICDCS_2019_submitted.pdf (1.2 MB)
Accepted version
Author(s)
Tuor, Tiffany
Wang, Shiqiang
Leung, Kin K
Ko, Bong Jun
Type
Conference Paper
Abstract
Large-scale distributed computing systems often contain thousands of distributed nodes (machines). Monitoring the conditions of these nodes is important for system management purposes, which, however, can be extremely resource demanding as this requires collecting local measurements of each individual node and constantly sending those measurements to a central controller. Meanwhile, it is often useful to forecast the future system conditions for various purposes such as resource planning/allocation and anomaly detection, but it is usually too resource-consuming to have one forecasting model running for each node, which may also neglect correlations in observed metrics across different nodes. In this paper, we propose a mechanism for collecting and forecasting the resource utilization of machines in a distributed computing system in a scalable manner. We present an algorithm that allows each local node to decide when to transmit its most recent measurement to the central node, so that the transmission frequency is kept below a given constraint value. Based on the measurements received from local nodes, the central node summarizes the received data into a small number of clusters. Since the cluster partitioning can change over time, we also present a method to capture the evolution of clusters and their centroids. As an effective way to reduce the amount of computation, time-series forecasting models are trained on the time-varying centroids of each cluster, to forecast the future resource utilizations of a group of local nodes. The effectiveness of our proposed approach is confirmed by extensive experiments using multiple real-world datasets.
Date Issued
2019-10-31
Date Acceptance
2019-10-01
Citation
2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019), 2019, pp.133-143
ISSN
1063-6927
Publisher
IEEE COMPUTER SOC
Start Page
133
End Page
143
Journal / Book Title
2019 39TH IEEE INTERNATIONAL CONFERENCE ON DISTRIBUTED COMPUTING SYSTEMS (ICDCS 2019)
Copyright Statement
© 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Sponsor
IBM United Kingdom Ltd
Identifier
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000565234200013&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
Grant Number
4603317662
Source
39th IEEE International Conference on Distributed Computing Systems (ICDCS)
Subjects
Science & Technology
Technology
Computer Science, Hardware & Architecture
Computer Science, Information Systems
Computer Science, Software Engineering
Computer Science, Theory & Methods
Computer Science
EVOLUTIONARY CLUSTERING-ALGORITHM
Publication Status
Published
Start Date
2019-07-07
Finish Date
2019-07-09
Coverage Spatial
Richardson, TX
Date Publish Online
2019-10-31