Scheduling weakly consistent C concurrency for reconfigurable hardware

File Description SizeFormat 
NadeshTC17.pdfAccepted version612.2 kBAdobe PDFView/Open
Title: Scheduling weakly consistent C concurrency for reconfigurable hardware
Authors: Ramanathan, N
Wickerson, J
Constantinides, GA
Item Type: Journal Article
Abstract: Lock-free algorithms, in which threads synchronise not via coarse-grained mutual exclusion but via fine-grained atomic operations (`atomics'), have been shown empirically to be the fastest class of multi-threaded algorithms in the realm of conventional processors. This article explores how these algorithms can be compiled from C to reconfigurable hardware via high-level synthesis(HLS). We focus on the scheduling problem, in which software instructions are assigned to hardware clock cycles. We first show that typical HLS scheduling constraints are insufficient to implement atomics, because they permit some instruction reorderings that, though sound in a single-threaded context, demonstrably cause erroneous results when synthesising multi-threaded programs. We then show that correct behaviour can be restored by imposing additional intra-thread constraints among the memory operations. In addition, we show that we can support the pipelining of loops containing atomics by injecting further inter-iteration constraints. We implement our approach on two constraint-based scheduling HLS tools: LegUp 4.0 and LegUp 5.1. We extend both tools to support two memory models that are capable of synthesising atomics correctly. The first memory model only supports sequentially consistent (SC) atomics and the second supports weakly consistent (`weak') atomics as defined by the 2011 revision of the C standard. Weak atomics necessitate fewer constraints than SC atomics, but suffice for many multi-threaded algorithms. We confirm, via automatic model-checking, that we correctly implement the semantics in accordance with the C standard. A case study on a circular buffer suggests that on average circuits synthesised from programs that schedule atomics correctly can be 6× faster than an existing lock-based implementation of atomics, that weak atomics can yield a further 1.3× speedup, and that pipelining can yield a further 1.3× speedup.
Issue Date: 1-Jul-2018
Date of Acceptance: 13-Dec-2017
URI: http://hdl.handle.net/10044/1/55594
DOI: https://dx.doi.org/10.1109/TC.2017.2786249
ISSN: 0018-9340
Publisher: Institute of Electrical and Electronics Engineers
Start Page: 992
End Page: 1006
Journal / Book Title: IEEE Transactions on Computers
Volume: 67
Issue: 7
Copyright Statement: © 2017 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Sponsor/Funder: Engineering & Physical Science Research Council (EPSRC)
Engineering & Physical Science Research Council (EPSRC)
Engineering & Physical Science Research Council (E
Royal Academy Of Engineering
Imagination Technologies Ltd
Funder's Grant Number: EP/I012036/1
EP/I020357/1
11908 (EP/K034448/1)
Prof Constantinides Chair
Prof Constantinides Chair
Keywords: Science & Technology
Technology
Computer Science, Hardware & Architecture
Engineering, Electrical & Electronic
Computer Science
Engineering
High-level synthesis
HLS
lock-free algorithms
atomic operations
FPGA
EFFICIENT
PROGRAMS
0803 Computer Software
0805 Distributed Computing
1006 Computer Hardware
Computer Hardware & Architecture
Publication Status: Published
Online Publication Date: 2017-12-29
Appears in Collections:Faculty of Engineering
Electrical and Electronic Engineering



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Creative Commonsx