Search code examples
numberscomputer-sciencerepresentation

Gaps between successive floating point numbers


(all numbers discussed are in decimal)

lets say we have a floating point data type that is like :

m * 10 ^ e

where m is the mantissa . and max mantissa size is 1 ( 0 <= m <= 9);

e is the exponent and its size is  -1 <= e <= 1

we say our data type Max value is 90 and its Min value is 0

BUT : that does not mean we can represent all numbers that are in this limit . we can only represent 27 numbers ( 9 * 3 ) excluding zero.

specifically we can't represent 89 in this way since it has a two digit mantissa (and non of them are zero).

so technically analogous to the above descriptions . in a float data type (in any programming language) there must be some integers between Max and Min values that we cannot store in a float data type .

is the above argument sound . if it is please give an example how to show this in java or c ?


Solution

  • Your reasoning is perfectly sound. The easiest to show it is be example, as you did.

    An non-representable example

    Consider the "usual single floating point" format, as defined by IEEE-754, it has 7 exponent bits, thus a range beyond [-2^127,2^127].

    It also has 24 mantissa bits, so let's consider 67108864, 67108865 and 67108866. Those numbers are respectively 2^26, 2^26+1 and 2^26+2.

    Try to normalize them to write them in the floating point format, and you'll see that

    • the mantissa gets value 26
    • the first bit disappears, because it is implicit in the IEEE-754 format that the first number is always* 1, so you're left with 25 bits for each number
    • all the next bits (in the limit of 24 bits) make up the mantissa...
      • 67108864 has only zeroes in its mantissa, since it's smallest bit is 0 you can remove it without losing information.
      • 67108866 has a 1 in its mantissa's last position, since it's smallest bit is also 0 you can still remove it without losing information.
      • 67108865 has only zeroes and a 1 as smallest bit, that is beyond the 24 bits ! So the number will be rounded to either 2^26 or 2^26+2.

    Thus you have an example, like 89 : 67108865 is not representable in a float.


    * except for subnormals, see below (expanding on the comment)

    Bias

    Indeed I skipped a part here. The exponent is not directly encoded in the bits that are reserved to it, it is biased. In the case of single floating points, the bias is 127.

    So our 26 is actually represented by 26+127, thus 153. Stealing the following image from wikipedia :

    negative zero drawing to illustrate floating point format

    If you take those numbers (sign, exponent and mantissa) as they are written and want to express a non-subnormal number, you get : (-1)sign * 2(exponent-127) * 1.mantissa

    Subnormals

    Once we reach the smallest possible exponent, that is once we write it 0 and mean -127, we stop supposing the initial 1. This, way, we can represent numbers smaller than 2-127 (by sacrificing precision, because we will have leading 0's on the mantissa).

    We then have : (-1)sign * 2-127 * 0.mantissa

    In particular, when the mantissa is all 0's, we have 0, and this is intended : now a number that has only 0's in its binary representation is read as 0. In some way, 0 is the smallest of subnormal numbers (though in practice people consider it just a special case on its own).

    Other special cases are when the exponent is all 1's. If the mantissa is all 0's then you have +/- infinity (depending on the sign), and if some mantissa bits are set you have a NaN.