Profile-directed specialisation of custom floating-point hardware
Author(s)
Brown, Ashley W.
Type
Thesis or dissertation
Abstract
We present a methodology for generating
floating-point arithmetic hardware
designs which are, for suitable applications, much reduced in size, while still
retaining performance and IEEE-754 compliance. Our system uses three
key parts: a profiling tool, a set of customisable
floating-point units and a
selection of system integration methods.
We use a profiling tool for
floating-point behaviour to identify arithmetic
operations where fundamental elements of IEEE-754
floating-point may be
compromised, without generating erroneous results in the common case.
In the uncommon case, we use simple detection logic to determine when
operands lie outside the range of capabilities of the optimised hardware.
Out-of-range operations are handled by a separate, fully capable,
floatingpoint
implementation, either on-chip or by returning calculations to a host
processor. We present methods of system integration to achieve this errorcorrection.
Thus the system suffers no compromise in IEEE-754 compliance,
even when the synthesised hardware would generate erroneous results.
In particular, we identify from input operands the shift amounts required
for input operand alignment and post-operation normalisation. For operations
where these are small, we synthesise hardware with reduced-size
barrel-shifters. We also propose optimisations to take advantage of other
profile-exposed behaviours, including removing the hardware required to
swap operands in a floating-point adder or subtractor, and reducing the
exponent range to fit observed values.
We present profiling results for a range of applications, including a selection
of computational science programs, Spec FP 95 benchmarks and the
FFMPEG media processing tool, indicating which would be amenable to
our method. Selected applications which demonstrate potential for optimisation
are then taken through to a hardware implementation. We show up
to a 45% decrease in hardware size for a
floating-point datapath, with a
correctable error-rate of less then 3%, even with non-profiled datasets.
floating-point arithmetic hardware
designs which are, for suitable applications, much reduced in size, while still
retaining performance and IEEE-754 compliance. Our system uses three
key parts: a profiling tool, a set of customisable
floating-point units and a
selection of system integration methods.
We use a profiling tool for
floating-point behaviour to identify arithmetic
operations where fundamental elements of IEEE-754
floating-point may be
compromised, without generating erroneous results in the common case.
In the uncommon case, we use simple detection logic to determine when
operands lie outside the range of capabilities of the optimised hardware.
Out-of-range operations are handled by a separate, fully capable,
floatingpoint
implementation, either on-chip or by returning calculations to a host
processor. We present methods of system integration to achieve this errorcorrection.
Thus the system suffers no compromise in IEEE-754 compliance,
even when the synthesised hardware would generate erroneous results.
In particular, we identify from input operands the shift amounts required
for input operand alignment and post-operation normalisation. For operations
where these are small, we synthesise hardware with reduced-size
barrel-shifters. We also propose optimisations to take advantage of other
profile-exposed behaviours, including removing the hardware required to
swap operands in a floating-point adder or subtractor, and reducing the
exponent range to fit observed values.
We present profiling results for a range of applications, including a selection
of computational science programs, Spec FP 95 benchmarks and the
FFMPEG media processing tool, indicating which would be amenable to
our method. Selected applications which demonstrate potential for optimisation
are then taken through to a hardware implementation. We show up
to a 45% decrease in hardware size for a
floating-point datapath, with a
correctable error-rate of less then 3%, even with non-profiled datasets.
Date Issued
2010-03
Date Awarded
2010-05
Advisor
Kelly, Paul
Luk, Wayne
Sponsor
EPSRC
Creator
Brown, Ashley W.
Publisher Department
Computing
Publisher Institution
Imperial College London
Qualification Level
Doctoral
Qualification Name
Doctor of Philosophy (PhD)