Hardware aware Convolutional Neural Network (CNN) training acceleration
File(s)
Author(s)
Vink, Diederik
Type
Thesis or dissertation
Abstract
Convolutional Neural Networks (CNNs) have emerged as a powerful deep learning tool, revolutionizing various domains such as computer vision. From applications like self-flying drones to self-driving cars, CNNs have demonstrated their effectiveness in enabling autonomous systems. As the demand for higher accuracy increases, CNN models are growing in complexity and time required to train. CNN training requires increasingly sophisticated hardware to handle the computational and memory requirements associated with training and inference tasks. However, CNNs are becoming increasingly intricate and have increasingly specific layers [3, 4, 5, 2, 1, 6, 7]. Additionally, the increasingly varying workloads and number representations is growing accordingly [8, 9, 10, 11].
One approach to address the issue of long training times is through the use of low-precision data representations and computations. In this thesis, a novel training strategy called MuPPET (Multi-Precision Policy Enforced Training) is proposed to that encompasses multiple precisions, including low-precision fixed-point representations.
Beyond long training times, CNNs provide a large variety of workloads throughout training. Currently accelerators are struggling to find the best hardware architecture to allow for efficient utilization of the available resources throughout training [12, 13]. Field-programmable gate arrays (FPGAs) provide a high degree of flexibility, unlocking the potential to adapt designs to the incoming workload. Caffe Barista integrate FPGAs into CNN training frameworks, providing a custom convolution kernel to accelerate training. Following Caffe Barista, FPGPT is a complete toolflow consisting of a state-of-the-art high performance FPGA convolution unit. This unit supports runtime workload adaptation providing the option to execute a variety of workloads on the same compiled design.
Finally, the thesis culminates in the creation of an acceleration policy building on MuPPET. This work utilizes the synergism between these two works to address the issues addressed by MuPPET and FPGPT better than either work could do individually.
One approach to address the issue of long training times is through the use of low-precision data representations and computations. In this thesis, a novel training strategy called MuPPET (Multi-Precision Policy Enforced Training) is proposed to that encompasses multiple precisions, including low-precision fixed-point representations.
Beyond long training times, CNNs provide a large variety of workloads throughout training. Currently accelerators are struggling to find the best hardware architecture to allow for efficient utilization of the available resources throughout training [12, 13]. Field-programmable gate arrays (FPGAs) provide a high degree of flexibility, unlocking the potential to adapt designs to the incoming workload. Caffe Barista integrate FPGAs into CNN training frameworks, providing a custom convolution kernel to accelerate training. Following Caffe Barista, FPGPT is a complete toolflow consisting of a state-of-the-art high performance FPGA convolution unit. This unit supports runtime workload adaptation providing the option to execute a variety of workloads on the same compiled design.
Finally, the thesis culminates in the creation of an acceleration policy building on MuPPET. This work utilizes the synergism between these two works to address the issues addressed by MuPPET and FPGPT better than either work could do individually.
Version
Open Access
Date Issued
2023-08
Date Awarded
2024-06
Copyright Statement
Creative Commons Attribution NonCommercial Licence
Advisor
Bouganis, Christos-Savvas
Sponsor
Engineering and Physical Sciences Research Council
Publisher Department
Electrical and Electronic Engineering
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)