54
IRUS TotalDownloads
Altmetric
High-level synthesis of fine-grained weakly consistent C concurrency
File | Description | Size | Format | |
---|---|---|---|---|
Ramanathan-N-2019-PhD-Thesis.pdf | Thesis | 1.95 MB | Adobe PDF | View/Open |
Title: | High-level synthesis of fine-grained weakly consistent C concurrency |
Authors: | Ramanathan, Nadesh |
Item Type: | Thesis or dissertation |
Abstract: | High-level synthesis (HLS) is the process of automatically compiling high-level programs into a netlist (collection of gates). Given an input program, HLS tools exploit its inherent parallelism and pipelining opportunities to generate efficient customised hardware. C-based programs are the most popular input for HLS tools, but these tools historically only synthesise sequential C programs. As the appeal for software concurrency rises, HLS tools are beginning to synthesise concurrent C programs, such as C/C++ pthreads and OpenCL. Although supporting software concurrency leads to better hardware parallelism, shared memory synchronisation is typically serialised to ensure correct memory behaviour, via locks. Locks are safety resources that ensure exclusive access of shared memory, eliminating data races and providing synchronisation guarantees for programmers. As an alternative to lock-based synchronisation, the C memory model also defines the possibility of lock-free synchronisation via fine-grained atomic operations (`atomics'). However, most HLS tools either do not support atomics at all or implement atomics using locks. Instead, we treat the synthesis of atomics as a scheduling problem. We show that we can augment the intra-thread memory constraints during memory scheduling of concurrent programs to support atomics. On average, hardware generated by our method is 7.5x faster than the state-of-the-art, for our set of experiments. Our method of synthesising atomics enables several unique possibilities. Chiefly, we are capable of supporting weakly consistent (`weak') atomics, which necessitate fewer ordering constraints compared to sequentially consistent (SC) atomics. However, implementing weak atomics is complex and error-prone and hence we formally verify our methods via automated model checking to ensure our generated hardware is correct. Furthermore, since the C memory model defines memory behaviour globally, we can globally analyse the entire program to generate its memory constraints. Additionally, we can also support loop pipelining by extending our methods to generate inter-iteration memory constraints. On average, weak atomics, global analysis and loop pipelining improve performance by 1.6x, 3.4x and 1.4x respectively, for our set of experiments. Finally, we present a case study of a real-world example via an HLS-based Google PageRank algorithm, whose performance improves by 4.4x via lock-free streaming and work-stealing. |
Content Version: | Open Access |
Issue Date: | Jun-2019 |
Date Awarded: | Nov-2019 |
URI: | http://hdl.handle.net/10044/1/78891 |
DOI: | https://doi.org/10.25560/78891 |
Copyright Statement: | Creative Commons Attribution NonCommercial NoDerivatives Licence |
Supervisor: | Constantinides, George Wickerson, John |
Sponsor/Funder: | Engineering and Physical Sciences Research Council |
Funder's Grant Number: | EP/L016796/1 EP/I020357/1 EP/K034448/1 EP/K015168/1 |
Department: | Electrical and Electronic Engineering |
Publisher: | Imperial College London |
Qualification Level: | Doctoral |
Qualification Name: | Doctor of Philosophy (PhD) |
Appears in Collections: | Electrical and Electronic Engineering PhD theses |