A Quality-Centric Data Model for Distributed Stream Management Systems
File(s)DISSP-QCM-QDB08.pdf (355.86 KB)
Published version
Author(s)
Pietzuch, PR
Fiscato, M
Vu, QH
Type
Conference Paper
Abstract
It is challenging for large-scale stream management systems to return always perfect results when processing data streams originating from distributed sources. Data sources and intermediate processing nodes may fail during the lifetime of a stream query. In addition, individual nodes may become overloaded due to processing demands. In practice, users have to accept incomplete or inaccurate
query results because of failure or overload. In this case, stream processing systems would benefit from knowing the impact of imperfect processing on data quality when making decisions about query optimisation and fault recovery. In addition, users would want to know how much the result quality was degraded.
In this paper, we propose a quality-centric relational stream data model that can be used together with existing query processing methods over distributed data streams. Besides giving useful feedback about the quality of tuples to users, the model provides the distributed stream management system with information on how to optimise query processing and enhance fault tolerance. We demonstrate how our data model can be applied to an existing distributed stream management system. Our evaluation shows that it enables quality-aware load-shedding, while introducing only a small pertuple overhead.
query results because of failure or overload. In this case, stream processing systems would benefit from knowing the impact of imperfect processing on data quality when making decisions about query optimisation and fault recovery. In addition, users would want to know how much the result quality was degraded.
In this paper, we propose a quality-centric relational stream data model that can be used together with existing query processing methods over distributed data streams. Besides giving useful feedback about the quality of tuples to users, the model provides the distributed stream management system with information on how to optimise query processing and enhance fault tolerance. We demonstrate how our data model can be applied to an existing distributed stream management system. Our evaluation shows that it enables quality-aware load-shedding, while introducing only a small pertuple overhead.
Date Issued
2009-08-01
Citation
7th International Workshop on Quality in Database (QDB’09), 2009, pp.1-10
Start Page
1
End Page
10
Journal / Book Title
7th International Workshop on Quality in Database (QDB’09)
Copyright Statement
© 2009 Universitat Tubingen and University of Rennes. Permission to make digital or hard copies of all or part of this work for
personal, academic, or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
personal, academic, or classroom use is granted without fee provided that
copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page.
Description
21.10.14 KB Ok to add published version to spiral.
Source
QDB 2009
Publication Status
Published
Start Date
2009-08-24
Finish Date
2009-08-24
Coverage Spatial
Lyon, France