Search code examples
floating-pointieee-754theorem

Why number are (not) representable in double precision IEEE754?


I am confused on IEEE754 double precision, I consider two questions:
1. Why each number from interval -254, -254+2, -254+4...254 is representable ?

2. Why 254+2 is not representable ?

Can you help me ? I understand way of working IEEE754 - however, I have a problem with seeing it.


Solution

  • There are 53 bits in the significand (or mantissa) of an IEEE 754 double. −254 can be exactly represented, as

    mantissa: 1.00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 00 (bin)
    exponent: 54
    sign:     1
    

    Now let's forget the sign bit for a moment. It is irrelevant for this explanation. So assume we have +254.

    With this exponent, the lowest -- rightmost -- bit of the significand has the value 2-52 * 254 = 4. So 254 + 4 is encoded as:

    mantissa: 1.00000 00000 00000 00000 00000 00000 00000 00000 00000 00000 01 (bin)
    exponent: 54                                                             ^
                                                                    lowest bit
    

    But there is no value inbetween. So you cannot encode 254 + 2.

    Why is this not a problem for −254 + 2? Because that is the same as −(254 − 2), and that is represented as:

    mantissa: 1.11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 11
    exponent: 53 !!
    sign:     1
    

    And the exponent 53 means you have steps of 2-52 * 253 = 2. The next value toward 0 is then:

    mantissa: 1.11111 11111 11111 11111 11111 11111 11111 11111 11111 11111 10
    exponent: 53
    sign:     1
    

    which is −254 + 4, or actually −(254 − 4). And you can go on like that until you reach −253.