54
IRUS Total
Downloads
  Altmetric

High-level synthesis of fine-grained weakly consistent C concurrency

File Description SizeFormat 
Ramanathan-N-2019-PhD-Thesis.pdfThesis1.95 MBAdobe PDFView/Open
Title: High-level synthesis of fine-grained weakly consistent C concurrency
Authors: Ramanathan, Nadesh
Item Type: Thesis or dissertation
Abstract: High-level synthesis (HLS) is the process of automatically compiling high-level programs into a netlist (collection of gates). Given an input program, HLS tools exploit its inherent parallelism and pipelining opportunities to generate efficient customised hardware. C-based programs are the most popular input for HLS tools, but these tools historically only synthesise sequential C programs. As the appeal for software concurrency rises, HLS tools are beginning to synthesise concurrent C programs, such as C/C++ pthreads and OpenCL. Although supporting software concurrency leads to better hardware parallelism, shared memory synchronisation is typically serialised to ensure correct memory behaviour, via locks. Locks are safety resources that ensure exclusive access of shared memory, eliminating data races and providing synchronisation guarantees for programmers.  As an alternative to lock-based synchronisation, the C memory model also defines the possibility of lock-free synchronisation via fine-grained atomic operations (`atomics'). However, most HLS tools either do not support atomics at all or implement atomics using locks. Instead, we treat the synthesis of atomics as a scheduling problem. We show that we can augment the intra-thread memory constraints during memory scheduling of concurrent programs to support atomics. On average, hardware generated by our method is 7.5x faster than the state-of-the-art, for our set of experiments. Our method of synthesising atomics enables several unique possibilities. Chiefly, we are capable of supporting weakly consistent (`weak') atomics, which necessitate fewer ordering constraints compared to sequentially consistent (SC) atomics. However, implementing weak atomics is complex and error-prone and hence we formally verify our methods via automated model checking to ensure our generated hardware is correct. Furthermore, since the C memory model defines memory behaviour globally, we can globally analyse the entire program to generate its memory constraints. Additionally, we can also support loop pipelining by extending our methods to generate inter-iteration memory constraints. On average, weak atomics, global analysis and loop pipelining improve performance by 1.6x, 3.4x and 1.4x respectively, for our set of experiments. Finally, we present a case study of a real-world example via an HLS-based Google PageRank algorithm, whose performance improves by 4.4x via lock-free streaming and work-stealing.
Content Version: Open Access
Issue Date: Jun-2019
Date Awarded: Nov-2019
URI: http://hdl.handle.net/10044/1/78891
DOI: https://doi.org/10.25560/78891
Copyright Statement: Creative Commons Attribution NonCommercial NoDerivatives Licence
Supervisor: Constantinides, George
Wickerson, John
Sponsor/Funder: Engineering and Physical Sciences Research Council
Funder's Grant Number: EP/L016796/1
EP/I020357/1
EP/K034448/1
EP/K015168/1
Department: Electrical and Electronic Engineering
Publisher: Imperial College London
Qualification Level: Doctoral
Qualification Name: Doctor of Philosophy (PhD)
Appears in Collections:Electrical and Electronic Engineering PhD theses