Stream processing dual-track CGRA for object inference

File Description SizeFormat 
tvls17xf5.pdfAccepted version2.35 MBAdobe PDFView/Open
Title: Stream processing dual-track CGRA for object inference
Authors: Fan, X
Wu, D
Cao, W
Luk, W
Wang, L
Item Type: Journal Article
Publication Date: 2018-06-01
Abstract: With the development of machine learning technology, the exploration of energy-efficient and flexible architectures for object inference algorithms is of growing interest in recent years. However, not many publications concentrate on a coarse-grained reconfigurable architecture (CGRA) for object inference algorithms. This paper provides a stream processing, dual-track programming CGRA-based approach to address the inherent computing characteristics of algorithms in object inference. Based on the proposed approach, an architecture called stream dual-track CGRA (SDT-CGRA) is presented as an implementation prototype. To evaluate the performance, the SDT-CGRA is realized in Verilog HDL and implemented in Semiconductor Manufacturing International Corporation 55-nm process, with the footprint of 5.19 mm & #x00B2; at 450 MHz. Seven object inference algorithms, including convolutional neural network (CNN), k-means, principal component analysis (PCA), spatial pyramid matching (SPM), linear support vector machine (SVM), Softmax, and Joint Bayesian, are selected as benchmarks. The experimental results show that the SDT-CGRA can gain on average 343.8 times and 17.7 times higher energy efficiency for Softmax, PCA, and CNN, 621.0 times and 1261.8 times higher energy efficiency for k-means, SPM, linear-SVM, and Joint-Bayesian algorithms when compared with the Intel Xeon E5-2637 CPU and the Nvidia TitanX graphics processing unit. When compared with the state-of-the-art solutions of AlexNet on field-programmable gate array and CGRA, the proposed SDT-CGRA can achieve a 1.78 times increase in energy efficiency and a 13 times speedup, respectively.
Issue Date: 1-Jun-2018
Date of Acceptance: 16-Jan-2018
URI: http://hdl.handle.net/10044/1/58712
DOI: https://dx.doi.org/10.1109/TVLSI.2018.2797600
ISSN: 1063-8210
Publisher: IEEE
Start Page: 1098
End Page: 1111
Journal / Book Title: IEEE Transactions on Very Large Scale Integration (VLSI) Systems
Volume: 26
Issue: 6
Copyright Statement: © 208 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Sponsor/Funder: Engineering & Physical Science Research Council (EPSRC)
Commission of the European Communities
Engineering & Physical Science Research Council (E
Engineering & Physical Science Research Council (EPSRC)
Funder's Grant Number: EP/I012036/1
671653
516075101 (EP/N031768/1)
EP/P010040/1
Keywords: Science & Technology
Technology
Computer Science, Hardware & Architecture
Engineering, Electrical & Electronic
Computer Science
Engineering
Acceleration
coarse-grained reconfigurable architecture (CGRA)
deep learning
domain-specific computing
object inference
ARCHITECTURES
0805 Distributed Computing
0906 Electrical And Electronic Engineering
1006 Computer Hardware
Computer Hardware & Architecture
Publication Status: Published
metadata.pubs.date.publish-online: 2018-02-12
Appears in Collections:Faculty of Engineering
Computing



Items in DSpace are protected by copyright, with all rights reserved, unless otherwise indicated.

Creative Commonsx