I've read a lot about floats, but it's all unnecessarily involved. I think I've got it pretty much understood, but there's just one thing I'd like to know for sure:
I know that, fractions of the form 1/pow(2,n)
, with n
an integer, can be represented exactly in floating point numbers. This means that if I add 1/32
to itself 32 million times, I would get exactly 1,000,000
.
What about something like 1/(32+16)
? It's one over the sum of two powers of two, does this work? Or is it 1/32+1/16
that works? This is where I'm confused, so if anyone could clarify that for me I would appreciate it.
The rule can be summed up as this:
So 1/(32 + 16)
is not representable in binary because it has a factor of 3 in the denominator. But 1/32 + 1/16 = 3/32
is.
That said, there are more restrictions to be representable in a floating-point type. For example, you only have 53 bits of mantissa in an IEEE double
so 1/2 + 1/2^500
is not representable.
So you can do sum of powers-of-two as long as the range of the exponents doesn't span more than 53 powers.
To generalize this to other bases:
A number can be exactly represented in base 10 if the prime factorization of the denominator consists of only 2's and 5's.
A rational number X
can be exactly represented in base N
if the prime factorization of the denominator of X
contains only primes found in the factorization of N
.