Artificial neural networks acceleration on field-programmable gate arrays considering model redundancy

Su, Jiang

614

IRUS Total
Downloads

Altmetric

Artificial neural networks acceleration on field-programmable gate arrays considering model redundancy

File	Description	Size	Format
Su-J-2018-PhD-Thesis.pdf	Thesis	4.83 MB	Adobe PDF	View/Open

Title:	Artificial neural networks acceleration on field-programmable gate arrays considering model redundancy
Authors:	Su, Jiang
Item Type:	Thesis or dissertation
Abstract:	Artificial Neural Networks (ANNs) have dramatically developed over the last ten years, and have been successfully applied in many important areas. A natural follow-up topic is to deploy ANNs to a wider range of hardware platforms. However, modern ANN models may aim for millisecond- or even nanosecond-level latency for each input processing while it is common for them to require million-level operations and gigabyte-scale data access for computing each input. This intrinsic high computational complexity introduces hardware challenges to the system implementation. Meanwhile, the integration of computing resources on hardware platforms is hampered by the slowing down of Moore’s Law. Therefore, it is important to study new design methods for ANN hardware systems that produce high model accuracy with low resource usage. Field-Programmable Gate Array (FPGA) is a natural fit for this topic due to its reconfigurability and flexibility. These features of FPGA allow us to implement customised data paths and data representations on hardware, which makes it the primary platform in this research. The main topics discussed in this thesis include neural network redundancy and its impact on hardware systems. The main goal is to reduce hardware complexity by reducing neural network redundancy and maintaining accuracy at the same time. To achieve this, redundancy is firstly categorised into two types: model- and data-level. Then, each type is studied in isolation before both are combined in a single system design. First, to study model-level redundancy, an algorithm called dropout is implemented as a way to reduce model-level redundancy during training and used here to reduce hardware cost. Our proposed system achieves a 50% reduction in DSP usage and 33% to 47% fewer on-chip memory usage compared to conventional implementations. Second, in terms of data-level redundancy, we aim to study how data precision affects hardware cost and system throughput. Our experiments show that reduced-precision data present negligible or even no accuracy loss to full-precision data on the tested benchmarks. In particular, the 4-bit fixed point presents a good trade-off between model accuracy and hardware cost compared to other tested data representations. Third, we studied the interactive effect of reducing both model- and data-level redundancy and proposed a FPGA accelerator design for Redundancy-Reduced (RR-) MobileNet [Hea17]. Our proposed RR-MobileNet system achieves a state-of-the-art latency, 7.85 ms, for single image processing in ImageNet inference. Finally, a design guideline is proposed as a step-by-step guidance for redundancy-reduced neural network system design.
Content Version:	Open Access
Issue Date:	Feb-2018
Date Awarded:	Dec-2018
URI:	http://hdl.handle.net/10044/1/66261
DOI:	https://doi.org/10.25560/66261
Supervisor:	Cheung, Peter Y. K. Thomas, David B.
Department:	Electrical and Electronic Engineering
Publisher:	Imperial College London
Qualification Level:	Doctoral
Qualification Name:	Doctor of Philosophy (PhD)
Appears in Collections:	Electrical and Electronic Engineering PhD theses

Unless otherwise indicated, items in Spiral are protected by copyright and are licensed under a Creative Commons Attribution NonCommercial NoDerivatives License.