Stream processing dual-track CGRA for object inference
File(s)tvls17xf5.pdf (2.3 MB)
Accepted version
Author(s)
Fan, X
Wu, D
Cao, W
Luk, W
Wang, L
Type
Journal Article
Abstract
With the development of machine learning technology, the exploration of energy-efficient and flexible architectures for object inference algorithms is of growing interest in recent years. However, not many publications concentrate on a coarse-grained reconfigurable architecture (CGRA) for object inference algorithms. This paper provides a stream processing, dual-track programming CGRA-based approach to address the inherent computing characteristics of algorithms in object inference. Based on the proposed approach, an architecture called stream dual-track CGRA (SDT-CGRA) is presented as an implementation prototype. To evaluate the performance, the SDT-CGRA is realized in Verilog HDL and implemented in Semiconductor Manufacturing International Corporation 55-nm process, with the footprint of 5.19 mm & #x00B2; at 450 MHz. Seven object inference algorithms, including convolutional neural network (CNN), k-means, principal component analysis (PCA), spatial pyramid matching (SPM), linear support vector machine (SVM), Softmax, and Joint Bayesian, are selected as benchmarks. The experimental results show that the SDT-CGRA can gain on average 343.8 times and 17.7 times higher energy efficiency for Softmax, PCA, and CNN, 621.0 times and 1261.8 times higher energy efficiency for k-means, SPM, linear-SVM, and Joint-Bayesian algorithms when compared with the Intel Xeon E5-2637 CPU and the Nvidia TitanX graphics processing unit. When compared with the state-of-the-art solutions of AlexNet on field-programmable gate array and CGRA, the proposed SDT-CGRA can achieve a 1.78 times increase in energy efficiency and a 13 times speedup, respectively.
Date Issued
2018-06-01
Date Acceptance
2018-01-16
Citation
IEEE Transactions on Very Large Scale Integration (VLSI) Systems, 2018, 26 (6), pp.1098-1111
ISSN
1063-8210
Publisher
IEEE
Start Page
1098
End Page
1111
Journal / Book Title
IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Volume
26
Issue
6
Copyright Statement
© 208 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Sponsor
Engineering & Physical Science Research Council (EPSRC)
Commission of the European Communities
Engineering & Physical Science Research Council (E
Engineering & Physical Science Research Council (EPSRC)
Grant Number
EP/I012036/1
671653
516075101 (EP/N031768/1)
EP/P010040/1
Subjects
Science & Technology
Technology
Computer Science, Hardware & Architecture
Engineering, Electrical & Electronic
Computer Science
Engineering
Acceleration
coarse-grained reconfigurable architecture (CGRA)
deep learning
domain-specific computing
object inference
ARCHITECTURES
0805 Distributed Computing
0906 Electrical And Electronic Engineering
1006 Computer Hardware
Computer Hardware & Architecture
Publication Status
Published
Date Publish Online
2018-02-12