Search code examples
floating-pointnumerical-analysis

Floating point multiplication compared to multiple additions


Well let's suppose that a is a normalized floating point number in basis 2 (binary system). Is the following equality correct?

fl(a+a+a)=fl(3*a)


Solution

  • Notation and Prequisites

    a denotes a mathematical number. a denotes a floating-point value. 3a denotes a mathematical expression (using real-number arithmetic). a+a+a and 3*a denote expressions using floating-point arithmetic.

    A fundamental characteristic of arithmetic in typical floating-point systems is that the result of a floating-point operation is defined to be the mathematical result rounded to the nearest representable value in some direction (most often in the direction of the nearest representable value with ties to the representable value with the even low digit, but other directions may be elected).1

    Finite, Normal Values

    In binary floating-point, if a is a representable and 2a is within the finite range, then 2a is representable, since the only difference in their representations is in the exponent. Therefore, given a floating-point number a representing the number a, the result of a+a is exactly 2a. Then the floating-point result of a+a+a (which is (a+a)+a) is the mathematical result 3a (since the mathematical result is 2a+a) rounded to the nearest representable value. And the floating-point result of 3*a is also the mathematical result 3a rounded to the nearest representable value. Therefore a+a+a and 3*a have the same floating-point result, and the equality holds.

    Special Cases

    It remains to consider special cases.

    If a, representing a, is finite, but 2a exceeds the range for which the floating-point result is finite, then a+a produces an infinity, and a+a+a produces the same infinity, and so does 3*a, so the equality holds.

    If a is an infinity, then a+a, a+a+a, and 3*a produce the same infinity, and the equality holds.

    If a is a NaN, then a+a+a and 3*a are both NaNs, and they do not compare equal because two NaNs are never numbers with equal values.

    Footnote

    1 The question does not specify the floating-point system used. Certainly one can define a floating-point system in which 1+1 produces 0 and 0+1 produces 0 while 3*1 produces 5. However, for the purposes of this answer, we assume a typical floating-point system.