I searched the google and with no luck to find the answer of what I'm looking for.
Imagine these 2 numbers in base 2:
A = 1.0001 * e-4
B = 1.001 * e-6
So to subtract these 2 numbers we need to shift the 2end one 2 bits right, to have the same exponents. So now we have:
A = 1.0001 * e-4
B = 0.001001 * e-4
Now our exponents are the same and we should do subtraction for significands, which means:
1.000100
- 0.001001
----------
0.111011
Then we do the normalization and rounding process on the result. As a human we know how to deal with this subtraction, but what about HW? Does it use any special algorithm for making negative the B
number (algorithm like 2's complement just like what we do for Integers)? This question is also valid when we want to do A+B
but the B
is a negative number.
how to implement IEEE 754 floating point subtraction in hardware?
To be clear: IEEE 754 does not dictate how HW must perform a subtraction, only that given 2 inputs and rounding mode what the results must be. The HW is a black box in the middle where a miracle occurs.
Sample HW subtract algorithm:
Assume N=8
bit significand, same sign for a b
(else use addition), and |a| >= |b| > 0
(else swap operands). Bit x
is 0 or 1.
1.xxxxxxx e AA
- 1.xxxxxxx e BB
Use N+2
wide register. Find shift needed AA - BB
. The shift will leave some of the b
bits in the N+2
register, and some to the "right".
1.xxxxxxx_00 e AA
- 0.00001xx_xx xxx e AA
From the xxx
shifted out, set a "borrow bit c" to are any of them 1?
c <-- Initial borrow bit
1.xxxxxxx_00
- 0.00001xx_xx
Now perform the subtraction in the usual fashion.
To ease the function explanation, consider 2 cases: no initial shift/shift, even though HW would use a single common path.
// Result with no shift. Initial borrow bit 'c' is then zero.
0.1111111_00 Max value
0.0000000_00 Min-value (a == b)
The result is shifted left and the exponent decremented until the lead bit is 1. The subtraction is exact. A zero result is handled specially (not shown).
In case 2, with shifting, 'c' is 0 or 1.
// Result with shift
1.1111110_11 Max value.
0.1111111_1x Lowest-values.
If the lead bit is 0, the result is shifted left and exponent decremented. It is the lowest-values sub-cases that oblige a N+2
instead of a N+1
register to get the rounding (below) correct.
Now rounding occurs. First the s
and c
bit are or'ed (is either one?) to form the new C
. Various rounding modes like up, down, truncate, toward infinity and the popular "round to nearest with ties to even" can be deduced solely from the sign, o
, r
and C
. The result is exact when r
and C
are zero.
o rs c
1.xxxxxxx_xx c
-->
1.xxxxxxx_x C
Now add the round bit R
.
1.xxxxxxx
0.000000R
This sum may result in 10.0000000
. In which case the result is shifted right and the exponent incremented.