floating-point bit-manipulation ieee-754

Is IEEE multiplication by a power of two associative?

Please confirm my understanding of IEEE floats.

I believe that if I have two numbers:

a = 2^e_a × m_a ← “mathematical ×”
b = 2^e_b× m_b

Then their product a * b is bit-identical to ((a * p) * b) / p, where p is a power-of-two (caveat: the intermediate values don’t overflow to infinity or lose precision around exponent -308; that is, it’s a p that satisfies (a * p) / p = a).

According to my understanding, a * b (IEEE *) is defined to be “the closest float to the mathematical value 2e_a+e_b × m_a × m_b” (closest, or whatever the rounding method does). That involves multiplying the mantissas, then shifting the mantissa as necessary (eg. 1.11111×1.11111 = 11.something so the final result will have exponent e_a+e_b+1, and the mantissa rounded to the number of bits in the float).

All of that is basically independent of the values of e_a and e_b; we can increase/decrease them however we want, and then undo it afterwards, and it won’t affect the final product.

Corollary:

Given a vector of doubles, where I want to calculate their product, I can do this by setting all their exponents to 1023 (by bit-twiddling) and adding the extracted exponents using an int. Then I can product the doubles reduced to the range ±[1,2) and finally bit-twiddle the exponent back in (and error if it's out of range ±1024). This would help cases like [1E300, 1E300, 1E-200, 1E-200] where a naive product would fail. And for cases where the naive product would work, the bit-twiddling version gives identical results.

Solution

TLDR: You are correct

Some assumptions

We ignore NaNs and infinities
We ignore denormalized numbers
We assume the IEEE-754 implementation uses power-of-2, not power-of-10

Then every normalized floating point number is encoded as 1.x * 2^y with a binary mantissa 1.x and an integer exponent y. An integer power of 2 is therefore encoded as 1.0 * 2^p. If we disregard the bit shifts for normalization, a multiplication is implemented as

(1.x * 2^y) * (1.u * 2^w) == (1.x * 1.u) * 2^(y+w)

Conversely, division is (1.x / 1.u) * 2^(y-w). Following this pattern, a multiplication with a power of two contains the multiplication 1.x * 1.0 which is the identity function and cannot introduce issues with associativity. The addition y + p for the exponent is an integer addition, which is associative. Therefore, multiplication and division by a power of two is associative.

This is not true if any intermediate result falls outside the integer range of the exponent. Then the result is denormalized and bits of the mantissa are rounded off.

Where can this be useful?

It means that you can replace any (relatively expensive) division by a power of 2 with a multiplication by its inverse without loss of precision. For compile time constants, compilers will do this automatically.

It is often useful to normalize numbers. For example when computing the L2 vector norm (sqrt(x^2 + y^2 + …)) it is useful to determine the scaling factor s = max(abs({x, y, …})) and compute s * sqrt((x/s)^2 + (y/s)^2 + …) to avoid over- or underflowing the intermediate squaring.

If we instead round s to a nearby power of 2, no rounding errors are introduced and the division can be turned into a multiplication. The standard math functions frexp and ldexp can be used to pick the factor.

Is IEEE multiplication by a power of two associative?

Where can this be useful?

Further reading