Multi-Precision Policy Enforced Training (MuPPET): A precision-switching
strategy for quantised fixed-point training of CNNs
strategy for quantised fixed-point training of CNNs
File(s)2006.09049v1.pdf (1.02 MB)
Working paper
Author(s)
Rajagopal, Aditya
Vink, Diederik Adriaan
Venieris, Stylianos I
Bouganis, Christos-Savvas
Type
Working Paper
Abstract
Large-scale convolutional neural networks (CNNs) suffer from very long
training times, spanning from hours to weeks, limiting the productivity and
experimentation of deep learning practitioners. As networks grow in size and
complexity, training time can be reduced through low-precision data
representations and computations. However, in doing so the final accuracy
suffers due to the problem of vanishing gradients. Existing state-of-the-art
methods combat this issue by means of a mixed-precision approach utilising two
different precision levels, FP32 (32-bit floating-point) and FP16/FP8
(16-/8-bit floating-point), leveraging the hardware support of recent GPU
architectures for FP16 operations to obtain performance gains. This work pushes
the boundary of quantised training by employing a multilevel optimisation
approach that utilises multiple precisions including low-precision fixed-point
representations. The novel training strategy, MuPPET, combines the use of
multiple number representation regimes together with a precision-switching
mechanism that decides at run time the transition point between precision
regimes. Overall, the proposed strategy tailors the training process to the
hardware-level capabilities of the target hardware architecture and yields
improvements in training time and energy efficiency compared to
state-of-the-art approaches. Applying MuPPET on the training of AlexNet,
ResNet18 and GoogLeNet on ImageNet (ILSVRC12) and targeting an NVIDIA Turing
GPU, MuPPET achieves the same accuracy as standard full-precision training with
training-time speedup of up to 1.84$\times$ and an average speedup of
1.58$\times$ across the networks.
training times, spanning from hours to weeks, limiting the productivity and
experimentation of deep learning practitioners. As networks grow in size and
complexity, training time can be reduced through low-precision data
representations and computations. However, in doing so the final accuracy
suffers due to the problem of vanishing gradients. Existing state-of-the-art
methods combat this issue by means of a mixed-precision approach utilising two
different precision levels, FP32 (32-bit floating-point) and FP16/FP8
(16-/8-bit floating-point), leveraging the hardware support of recent GPU
architectures for FP16 operations to obtain performance gains. This work pushes
the boundary of quantised training by employing a multilevel optimisation
approach that utilises multiple precisions including low-precision fixed-point
representations. The novel training strategy, MuPPET, combines the use of
multiple number representation regimes together with a precision-switching
mechanism that decides at run time the transition point between precision
regimes. Overall, the proposed strategy tailors the training process to the
hardware-level capabilities of the target hardware architecture and yields
improvements in training time and energy efficiency compared to
state-of-the-art approaches. Applying MuPPET on the training of AlexNet,
ResNet18 and GoogLeNet on ImageNet (ILSVRC12) and targeting an NVIDIA Turing
GPU, MuPPET achieves the same accuracy as standard full-precision training with
training-time speedup of up to 1.84$\times$ and an average speedup of
1.58$\times$ across the networks.
Date Issued
2020-06-16
Citation
2020
Publisher
arXiv
Copyright Statement
© 2020 The Author(s)
Identifier
http://arxiv.org/abs/2006.09049v1
Subjects
cs.CV
cs.CV
cs.LG
Notes
Accepted at the 37th International Conference on Machine Learning (ICML), 2020
Publication Status
Published