Search code examples
ieee-754floating-point-precision

Reciprocal representation of integers in floating point numbers


I need to store a lot of different values as doubles between 0 to 1, to have a uniform representation. For example, an ARGB value - that is a 32-bit integer. Can doubles uniquely represent every integer value if i store it as a reciprocal? I know there's enough bits to do it, but I'm not sure whether the exponential spacing will prevent this.


Solution

  • The standard double has 52 bit mantissa, so yes, it is capable to hold and exactly reproduce a 32 bit integer. Another problem is the requirement that they have to be beetween 0 and 1. The reciprocal is not the way to do that! Counterexample: 1/3 is not exactly representable by a double. You will have to divide the values to ensure the range. You may only divide or multiply by powers of two to preserve exact accuracy. So given you have unsigned 32 bit values convert them to double and then divide by 2^32. If you revert that on reading the values should be reproduced exactly. In C or C++ there are even special instructions to manipulate exponent and mantissa of a float or double directly, these may be more efficient and secure.