Customisable processing of neural networks for FPGAs
File(s)3597031.3597041.pdf (1.08 MB)
Published version
Author(s)
Denholm, Stewart
Luk, Wayne
Type
Conference Paper
Abstract
When implementing neural networks on FPGAs, existing methods for resource optimisation are closely tied to the design and performance of the neural network itself. We wish to independently control the individual Processing Elements (PEs) responsible for processing neural network data. We introduce a new framework that more stringently defines neural networks as a series of successive layers. By isolating layers, we can create stand-alone compute pools to process each layer, with an initial focus on activation functions. A pool with 𝑃 PEs serves the 𝑁 neurons of that layer, with the entire compute pool, not individual PEs, charged with performing the activation function. This means the number of PEs in a layer, their implementation, functionality, and range of inputs they serve can all be configured and specialised independently of the
higher-level neural network. We can now tailor a neural network’s implementation to specific FPGA devices, adding PEs to make use of all the heterogeneous processing elements present on the FPGA. In addition to customising the resource footprint of a neural net-
work, this greater range of control over each PE’s functionality allows performance optimisations arising from the distribution of the input data itself. More PEs can be added to the compute pool to serve more common inputs, or removed for less used inputs to free up resources. We manage inter-layer data flow to support non-deterministic processing times, a key requirement for de-coupling the design of neural networks from that of the underlying PEs. We demonstrate our framework by showing (1) large PEs can be efficiently segmented into many smaller PEs, (2) latency reduction of 1.47x or greater is achievable for activation function layers in existing neural networks, and (3) that PEs serving more than 50%
of all possible function inputs see vastly diminishing returns on performance as they scale to the more traditional 100% support.
higher-level neural network. We can now tailor a neural network’s implementation to specific FPGA devices, adding PEs to make use of all the heterogeneous processing elements present on the FPGA. In addition to customising the resource footprint of a neural net-
work, this greater range of control over each PE’s functionality allows performance optimisations arising from the distribution of the input data itself. More PEs can be added to the compute pool to serve more common inputs, or removed for less used inputs to free up resources. We manage inter-layer data flow to support non-deterministic processing times, a key requirement for de-coupling the design of neural networks from that of the underlying PEs. We demonstrate our framework by showing (1) large PEs can be efficiently segmented into many smaller PEs, (2) latency reduction of 1.47x or greater is achievable for activation function layers in existing neural networks, and (3) that PEs serving more than 50%
of all possible function inputs see vastly diminishing returns on performance as they scale to the more traditional 100% support.
Date Issued
2023-06
Date Acceptance
2023-05-14
Citation
HEART '23: Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies, 2023, pp.69-77
ISBN
9798400700439
Publisher
ACM
Start Page
69
End Page
77
Journal / Book Title
HEART '23: Proceedings of the 13th International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies
Copyright Statement
© 2023 Copyright held by the owner/author(s).
This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
This work is licensed under a Creative Commons Attribution 4.0 International License (https://creativecommons.org/licenses/by/4.0/).
License URL
Identifier
https://dl.acm.org/doi/abs/10.1145/3597031.3597041
Source
International Symposium on Highly Efficient Accelerators and Reconfigurable Technologies 2023
Place of Publication
New York, NY, United States
Publication Status
Published
Start Date
2023-06-15
Finish Date
2023-06-16
Coverage Spatial
Lake Biwa, Japan
Date Publish Online
2023-07-19