Towards batch-processing on cold storage devices
File(s)ColdPack_HardBD___Camera_Ready.pdf (2.2 MB)
Accepted version
Author(s)
Hadian, Ali
Heinis, Thomas
Type
Conference Paper
Abstract
Large amounts of data in storage systems is cold, i.e., Written Once and Read Occasionally (WORO). The rapid growth of massive-scale archival and historical data increases the demand for petabyte-scale cheap storage for such cold data. A Cold Storage Device (CSD) is a disk-based storage system which is designed to trade off performance for cost and power efficiency. Inevitably, the design restrictions used in CSD's results in performance limitations. These limitations are not a concern for WORO workloads, however, the very low price/performance characteristics of CSDs makes them interesting for other applications, e.g., batch processes, too. Applications, however, can be very slow on CSD's if they do not take their characteristics into account. In this paper we design two strategies for data partitioning in CSDs -- a crucial operation in many batch analytics tasks like hash-join, near-duplicate detection, and data localization. We show that our strategies can efficiently use CSDs for batch processing of terabyte-scale data by accelerating data partitioning by 3.5x in our experiments.
Date Issued
2018-07-05
Date Acceptance
2018-04-16
Citation
34th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2018, Paris, France, April 16-20, 2018, 2018, pp.134-139
ISBN
9781538663073
ISSN
2473-3490
Publisher
IEEE
Start Page
134
End Page
139
Journal / Book Title
34th IEEE International Conference on Data Engineering Workshops, ICDE Workshops 2018, Paris, France, April 16-20, 2018
Copyright Statement
© 2018 IEEE.
Sponsor
Engineering & Physical Science Research Council (E
European Research Office
Identifier
https://doi.org/10.1109/ICDEW.2018.00028
Grant Number
EP/N023242/1
720270
Source
34th International Conference on Data Engineering Workshops (ICDEW)
Subjects
Science & Technology
Technology
Computer Science, Information Systems
Computer Science, Theory & Methods
Engineering, Electrical & Electronic
Computer Science
Engineering
Notes
timestamp: Tue, 31 Jul 2018 01:00:00 +0200 biburl: https://dblp.org/rec/bib/conf/icde/HadianH18 bibsource: dblp computer science bibliography, https://dblp.org
Publication Status
Published
Start Date
2018-04-16
Finish Date
2018-04-20
Coverage Spatial
Paris, France
Date Publish Online
2018-07-05