Search code examples
precisionnumericieee-754

When is (x==(x+y)-y) or (x==(x-y)+y) guaranteed for IEEE floats?


In C or another language which uses IEEE floats, I have two variables x and y which are both guaranteed to be finite, non-NaN, basically normal numbers.

I have some code which assumes, in essence, that the following code has no effect:

float x = get_x ();
float y = get_y ();

float old_x = x;
x += y;
x -= y;
assert (old_x == x);
x -= y;
x += y;
assert (old_x == x);

I know that this will be true for certain classes of values, i.e. those which do not have "many" significant figures in the mantissa, but I would like to be clear about the edge cases.

For example, the binary expression of 1.3 will have significant figures all the way down the mantissa, and so will 1.7, and I should not assume that 1.3+1.7==3 exactly, but can I assume that if I add such numbers together and then subtract them, or vice versa, I will get the first value back again?

What are the formal edge conditions for this?


Solution

  • The number of bits in the floating point pipeline is not part of the standard.

    From Wikipedia:

    The standard also recommends extended format(s) to be used to perform internal computations at a higher precision than that required for the final result, to minimise round-off errors: the standard only specifies minimum precision and exponent requirements for such formats. The x87 80-bit extended format is the most commonly implemented extended format that meets these requirements.

    So since the internal formats can be extended, not knowing when internal formats get truncated to standard formats, what rounding method is being used, the assumption that adding a value and then subtracting it again will result in the original value is not guaranteed by the standard.

    For the trivial case you posted it probably would work most of the time.

    Then there is the case of handling NAN.

    You may be able to determine edge cases for the architecture you are currently using but its probably easier to just check if the current value is within margin of error of original value.