An efficient implementation of online arithmetic
File(s)AaronFPT16.pdf (697.93 KB)
Accepted version
Author(s)
Zhao, Y
Wickerson, J
Constantinides, GA
Type
Conference Paper
Abstract
We propose the first hardware implementation of
standard arithmetic operators – addition, multiplication, and
division – that utilises constant compute resource but allows
numerical precision to be adjusted arbitrarily at run-time.
Traditionally, precision must be set at design-time so that addition
and multiplication, which calculate the least significant digit
(LSD) of their results first, and division, which calculates the
most significant digit (MSD) first, can be chained together. To
get around this, we employ online operators, which are always
MSD-first, and thus allow successive operations to be pipelined.
Even online operators require precision to be fixed at design-time
because multiplication and division traditionally involve parallel
adders. To avoid this, we propose an architecture, which we have
implemented on an FPGA, that reuses a fixed-precision adder and
stores residues in on-chip RAM. As such, we can use a single piece
of hardware to perform calculations to any precision, limited only
by the availability of on-chip RAM. For instance, we obtain an
8x speed-up, compared to the parallel-in-serial-out (PISO) fixedpoint
method, when executing 100 iterations of Newton’s method
at a precision of 64 digits, while the product of circuit area and
latency stays comparable.
standard arithmetic operators – addition, multiplication, and
division – that utilises constant compute resource but allows
numerical precision to be adjusted arbitrarily at run-time.
Traditionally, precision must be set at design-time so that addition
and multiplication, which calculate the least significant digit
(LSD) of their results first, and division, which calculates the
most significant digit (MSD) first, can be chained together. To
get around this, we employ online operators, which are always
MSD-first, and thus allow successive operations to be pipelined.
Even online operators require precision to be fixed at design-time
because multiplication and division traditionally involve parallel
adders. To avoid this, we propose an architecture, which we have
implemented on an FPGA, that reuses a fixed-precision adder and
stores residues in on-chip RAM. As such, we can use a single piece
of hardware to perform calculations to any precision, limited only
by the availability of on-chip RAM. For instance, we obtain an
8x speed-up, compared to the parallel-in-serial-out (PISO) fixedpoint
method, when executing 100 iterations of Newton’s method
at a precision of 64 digits, while the product of circuit area and
latency stays comparable.
Date Issued
2017-05-18
Date Acceptance
2016-09-18
Citation
2016 International Conference on Field-Programmable Technology (FPT), 2017
Publisher
IEEE
Journal / Book Title
2016 International Conference on Field-Programmable Technology (FPT)
Copyright Statement
© 2016 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes, creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.
Sponsor
Royal Academy Of Engineering
Imagination Technologies Ltd
Engineering & Physical Science Research Council (E
Engineering & Physical Science Research Council (EPSRC)
Grant Number
Prof Constantinides Chair
Prof Constantinides Chair
11908 (EP/K034448/1)
EP/I020357/1
Source
IEEE International Conference on Field Programmable Technology (FPT)
Subjects
Science & Technology
Technology
Computer Science, Theory & Methods
Engineering, Electrical & Electronic
Computer Science
Engineering
Publication Status
Published
Start Date
2016-12-07
Finish Date
2016-12-09
Coverage Spatial
Xian, China