Medea: scheduling of long running applications in shared production clusters
File(s)paper223-CR.pdf (1.22 MB)
Accepted version
Author(s)
Garefalakis, Panagiotis
Karanasos, Konstantinos
Pietzuch, Peter
Suresh, Arun
Rao, Sriram
Type
Conference Paper
Abstract
The rise in popularity of machine learning, streaming, and latency-sensitive
online applications in shared production clusters has
raised new challenges for cluster schedulers. To optimize their
performance and resilience, these applications require precise control
of their placements, by means of complex constraints, e.g., to
collocate or separate their long-running containers across groups
of nodes. In the presence of these applications, the cluster scheduler
must attain global optimization objectives, such as maximizing
the number of deployed applications or minimizing the violated
constraints and the resource fragmentation, but without affecting
the scheduling latency of short-running containers.
We present Medea, a new cluster scheduler designed for the
placement of long- and short-running containers. Medea introduces
powerful placement constraints with formal semantics to capture
interactions among containers within and across applications. It
follows a novel two-scheduler design: (i) for long-running containers,
it applies an optimization-based approach that accounts for
constraints and global objectives; (ii) for short-running containers,
it uses a traditional task-based scheduler for low placement latency.
Evaluated on a 400-node cluster, our implementation of Medea on
Apache Hadoop YARN achieves placement of long-running applications
with significant performance and resilience benefits compared
to state-of-the-art schedulers.
online applications in shared production clusters has
raised new challenges for cluster schedulers. To optimize their
performance and resilience, these applications require precise control
of their placements, by means of complex constraints, e.g., to
collocate or separate their long-running containers across groups
of nodes. In the presence of these applications, the cluster scheduler
must attain global optimization objectives, such as maximizing
the number of deployed applications or minimizing the violated
constraints and the resource fragmentation, but without affecting
the scheduling latency of short-running containers.
We present Medea, a new cluster scheduler designed for the
placement of long- and short-running containers. Medea introduces
powerful placement constraints with formal semantics to capture
interactions among containers within and across applications. It
follows a novel two-scheduler design: (i) for long-running containers,
it applies an optimization-based approach that accounts for
constraints and global objectives; (ii) for short-running containers,
it uses a traditional task-based scheduler for low placement latency.
Evaluated on a 400-node cluster, our implementation of Medea on
Apache Hadoop YARN achieves placement of long-running applications
with significant performance and resilience benefits compared
to state-of-the-art schedulers.
Date Issued
2018-04-23
Date Acceptance
2018-01-30
Citation
EuroSys '18 Proceedings of the Thirteenth EuroSys Conference, 2018, (Article No. 4)
ISBN
978-1-4503-5584-1
Publisher
ACM
Journal / Book Title
EuroSys '18 Proceedings of the Thirteenth EuroSys Conference
Issue
Article No. 4
Copyright Statement
© 2018 ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in EuroSys '18 Proceedings of the Thirteenth EuroSys Conference, (23 April 2018) https://dl.acm.org/citation.cfm?id=3190549
Sponsor
Engineering & Physical Science Research Council (EPSRC)
Engineering & Physical Science Research Council (EPSRC)
Grant Number
EP/I012036/1
EP/P010040/1
Source
EuroSys 2018
Publication Status
Published
Start Date
2018-04-23
Finish Date
2018-04-26
Coverage Spatial
Porto, Portugal
Date Publish Online
2018-03-15