Search code examples
floating-pointieee-754

Why does the exponent IEEE754 (single) limit between 2^{-126}<= e <=2^{127}?


I am referring to this question but I'm not satisfied with the answer: IEEE-754 32 Bit (single precision) exponent -126 instead of -127

Does the answer imply that actually the exponent can be enter image description here but because of the representation in denormalized form the smallest possitive number is $1\cdot2^{-127}\cdot1.\cdots$ equal to $1\cdot2^{-126}\cdot0.1\cdots$.

Does this mean than that the smallest representable positive number is:

or $1\cdot2^{-126}0.1(21\cdot0)1$

Which of the last two options?

Thank you very much in advance.

Thank you again to everyone, you all gave great explanations!


Solution

  • Why does the exponent IEEE754 (single) limit between 2^{-126}<= e <=2^{127}?
    Reworded:
    Why does the exponent IEEE754 (single) limit between -126 <= exponent <= 127?

    binary32 has an 8-bit biased exponent field allowing for 256 values. A wider/smaller exponent field could have been selected, yet it is 8 here. 2 values, 0 and 255, have special meaning leaving 254.

    In 1970s, it was concluded to symmetrically distribute these biased exponent values +/- about zero leading to the 254 values -126 to +127 by employing a 127 offset.

    Normal values have a form of: sign * (1.xxx... total 23 x's...xxx) * 2exponent - offset providing a 24-bit binary precision.

    For various numerical computational reasons (after much debate), it was concluded that |values| less than the smallest normal positive number 1.0 * 21 - 127 should have a gradual loss of precision. These are sub-normal or denormal numbers. They are encoded with a biased exponent of 0 and with the same resultant exponent of -126 as the smallest normal number.

             v--------------------------------------- Implied valued              
             | v--------------------------v---------- Significant explicitly encoded
             | |                          |     v---- Biased exponent
             | |                          |     | v-v Implied offset  
    2^-126 = 1.000 0000 0000 0000 0000 0000 * 2^1-127 // smallest normal
             0.111 1111 1111 1111 1111 1111 * 2^0-126 // largest sub-normal
    2^-127 = 0.100 0000 0000 0000 0000 0000 * 2^0-126
    2^-128 = 0.010 0000 0000 0000 0000 0000 * 2^0-126
    2^-129 = 0.001 0000 0000 0000 0000 0000 * 2^0-126
    ...
    2^-149 = 0.000 0000 0000 0000 0000 0001 * 2^0-126 // smallest sub-normal
    0.0f   = 0.000 0000 0000 0000 0000 0000 * 2^0-126 // zero
    

    So values down to 2-127 have 24-bit precision and values down to 2-149 have ever decreasing precision.