Evaluating end-to-end optimization for data analytics applications in weld
File(s)nvl.pdf (464.55 KB)
Accepted version
Author(s)
Type
Journal Article
Abstract
Modern analytics applications use a diverse mix of libraries and functions. Unfortunately, there is no optimization across these libraries, resulting in performance penalties as high as an order of magnitude in many applications. To address this problem, we proposed Weld, a common runtime for existing data analytics libraries that performs key physical optimizations such as pipelining under existing, imperative library APIs. In this work, we further develop the Weld vision by designing an automatic adaptive optimizer for Weld applications, and evaluating its impact on realistic data science workloads. Our optimizer eliminates multiple forms of overhead that arise when composing imperative libraries like Pandas and NumPy, and uses lightweight measurements to make data-dependent decisions at run-time in ad-hoc workloads where no statistics are available, with sub-second overhead. We also evaluate which optimizations have the largest impact in practice and whether Weld can be integrated into libraries incrementally. Our results are promising: using our optimizer, Weld accelerates data science workloads by up to 23X on one thread and 80X on eight threads, and its adaptive optimizations provide up to a 3.75X speedup over rule-based optimization. Moreover, Weld provides benefits if even just 4--5 operators in a library are ported to use it. Our results show that common runtime designs like Weld may be a viable approach to accelerate analytics.
Date Issued
2018-05-01
Date Acceptance
2018-04-01
Citation
Proceedings of the VLDB Endowment, 2018, 11 (9), pp.1002-1015
ISSN
2150-8097
Publisher
VLDB Endowment
Start Page
1002
End Page
1015
Journal / Book Title
Proceedings of the VLDB Endowment
Volume
11
Issue
9
Copyright Statement
© 2018 VLDB Endowment. Published by ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in Proceedings of the VLDB Endowment, Volume 11 Issue 9, May 2018, https://dl.acm.org/citation.cfm?doid=3213880.3232245
Identifier
http://gateway.webofknowledge.com/gateway/Gateway.cgi?GWVersion=2&SrcApp=PARTNER_APP&SrcAuth=LinksAMR&KeyUT=WOS:000452532800007&DestLinkType=FullRecord&DestApp=ALL_WOS&UsrCustomerID=1ba7043ffcc86c417c072aa74d649202
Subjects
Science & Technology
Technology
Computer Science, Information Systems
Computer Science
QUERY
PERFORMANCE
Publication Status
Published