Repository logo
  • Log In
    Log in via Symplectic to deposit your publication(s).
Repository logo
  • Communities & Collections
  • Research Outputs
  • Statistics
  • Log In
    Log in via Symplectic to deposit your publication(s).
  1. Home
  2. Faculty of Engineering
  3. Faculty of Engineering
  4. Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA
 
  • Details
Optimizing CNN-based segmentation with deeply customized convolutional and deconvolutional architectures on FPGA
File(s)
[Final] Optimizing CNN-based Segmentation with Deeply Customized Convolutional and Deconvolutional Architectures on FPGA.pdf (1.33 MB)
Accepted version
Author(s)
Liu, S
Fan, Hongxiang
Niu, Xinyu
Ng, Ho-Cheung
Chu, Yang
more
Type
Journal Article
Abstract
Convolutional Neural Networks (CNNs) based algorithms have been successful in solving image recognition
problems, showing very large accuracy improvement. In recent years, deconvolution layers are widely used
as key components in the state-of-the-art CNNs for end-to-end training and models to support tasks such
as image segmentation and super resolution. However, the deconvolution algorithms are computationally
intensive which limits their applicability to real time applications. Particularly, there has been little research
on the efficient implementations of deconvolution algorithms on FPGA platforms which have been widely
used to accelerate CNN algorithms by practitioners and researchers due to their high performance and power
efficiency. In this work, we propose and develop deconvolution architecture for efficient FPGA implementation.
FPGA-based accelerators are proposed for both deconvolution and CNN algorithms. Besides, memory sharing
between the computation modules is proposed for the FPGA-based CNN accelerator as well as for other
optimization techniques. A non-linear optimization model based on the performance model is introduced to
efficiently explore the design space in order to achieve optimal processing speed of the system and improve
power efficiency. Furthermore, a hardware mapping framework is developed to automatically generate the
low-latency hardware design for any given CNN model on the target device. Finally, we implement our
designs on Xilinx Zynq ZC706 board and the deconvolution accelerator achieves a performance of 90.1 GOPS
under 200MHz working frequency and a performance density of 0.10 GOPS/DSP using 32-bit quantization,
which significantly outperforms previous designs on FPGAs. A real-time application of scene segmentation
on Cityscapes Dataset is used to evaluate our CNN accelerator on Zynq ZC706 board, and the system achieves
a performance of 107 GOPS and 0.12 GOPS/DSP using 16-bit quantization, and supports up to 17 frames per
second for 512x512 image inputs with a power consumption of only 9.6W.
Date Issued
2018-12-01
Date Acceptance
2018-07-25
Citation
ACM Transactions on Reconfigurable Technology and Systems, 2018, 11 (3)
URI
http://hdl.handle.net/10044/1/62876
DOI
https://www.dx.doi.org/10.1145/3242900
ISSN
1936-7406
Publisher
Association for Computing Machinery
Journal / Book Title
ACM Transactions on Reconfigurable Technology and Systems
Volume
11
Issue
3
Copyright Statement
© 2018 ACM. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in PUBLICATION, {VOL 11, ISS 3, (DATE Dec 2018)} http://doi.org/10.1145/3242900 .
Subjects
Science & Technology
Technology
Computer Science, Hardware & Architecture
Computer Science
FPGA
convolutional neural networks (CNNs)
deconvolution
hardware acceleration
segmentation
1006 Computer Hardware
Publication Status
Published
Article Number
19
Date Publish Online
2018-12
About
Spiral Depositing with Spiral Publishing with Spiral Symplectic
Contact us
Open access team Report an issue
Other Services
Scholarly Communications Library Services
logo

Imperial College London

South Kensington Campus

London SW7 2AZ, UK

tel: +44 (0)20 7589 5111

Accessibility Modern slavery statement Cookie Policy

Built with DSpace-CRIS software - Extension maintained and optimized by 4Science

  • Cookie settings
  • Privacy policy
  • End User Agreement
  • Send Feedback