Custom Multi-Cache Architectures for Heap Manipulating Programs
File(s)FelixTCAD2016.pdf (1.79 MB)
Accepted version
Author(s)
Winterstein, F
Fleming, K
Yang, H-J
Constantinides, GA
Type
Journal Article
Abstract
Memory-intensive implementations often require access to an external, off-chip memory which can substantially slow down an FPGA
accelerator due to memory bandwidth limitations. Buffering frequently reused data on chip is a common approach to address this
problem and the optimization of the cache architecture introduces yet another complex design space. This paper presents a high-level
synthesis (HLS) design aid that automatically generates parallel multi-cache systems which are tailored to the specific requirements of
the application. Our program analysis identifies non-overlapping memory regions, supported by private caches, and regions which
are shared by parallel units after parallelization, which are supported by coherent caches and synchronization primitives. It also
decides whether the parallelization is legal with respect to data dependencies. The novelty of this work is the focus on programs using
dynamically allocated, pointer-based data structures which, while common in software engineering, remain difficult to analyze and
are beyond the scope of the overwhelming majority of HLS techniques to date. Secondly, we devise a high-level cache performance
estimation to find a heterogeneous configuration of cache sizes that maximizes the performance of the multi-cache system subject to
an on-chip memory resource constraint. We demonstrate our technique with three case studies of applications using dynamic data
structures and use Xilinx Vivado HLS as an exemplary HLS tool. We show up to 15× speed-up after parallelization of the HLS
implementations and the insertion of the application-specific distributed hybrid multi-cache architecture.
accelerator due to memory bandwidth limitations. Buffering frequently reused data on chip is a common approach to address this
problem and the optimization of the cache architecture introduces yet another complex design space. This paper presents a high-level
synthesis (HLS) design aid that automatically generates parallel multi-cache systems which are tailored to the specific requirements of
the application. Our program analysis identifies non-overlapping memory regions, supported by private caches, and regions which
are shared by parallel units after parallelization, which are supported by coherent caches and synchronization primitives. It also
decides whether the parallelization is legal with respect to data dependencies. The novelty of this work is the focus on programs using
dynamically allocated, pointer-based data structures which, while common in software engineering, remain difficult to analyze and
are beyond the scope of the overwhelming majority of HLS techniques to date. Secondly, we devise a high-level cache performance
estimation to find a heterogeneous configuration of cache sizes that maximizes the performance of the multi-cache system subject to
an on-chip memory resource constraint. We demonstrate our technique with three case studies of applications using dynamic data
structures and use Xilinx Vivado HLS as an exemplary HLS tool. We show up to 15× speed-up after parallelization of the HLS
implementations and the insertion of the application-specific distributed hybrid multi-cache architecture.
Date Issued
2016-09-13
Date Acceptance
2016-08-29
Citation
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2016, 36 (5), pp.761-774
ISSN
0278-0070
Publisher
Institute of Electrical and Electronics Engineers (IEEE)
Start Page
761
End Page
774
Journal / Book Title
IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems
Volume
36
Issue
5
Copyright Statement
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Sponsor
Engineering & Physical Science Research Council (EPSRC)
Engineering & Physical Science Research Council (EPSRC)
Engineering & Physical Science Research Council (E
European Space Agency / Estec
Royal Academy Of Engineering
Imagination Technologies Ltd
Grant Number
EP/I012036/1
EP/I020357/1
11908 (EP/K034448/1)
Cntrct No. 4000106443/12/D/JR
Prof Constantinides Chair
Prof Constantinides Chair
Subjects
Science & Technology
Technology
Computer Science, Hardware & Architecture
Computer Science, Interdisciplinary Applications
Engineering, Electrical & Electronic
Computer Science
Engineering
Caching schemes
dynamic data structures
field-programmable gate array (FPGA)
high-level synthesis (HLS)
memory system
separation logic
COMMUTATIVITY ANALYSIS
0906 Electrical And Electronic Engineering
1006 Computer Hardware
Computer Hardware & Architecture
Publication Status
Published