When we have a binary Intel processor and a double precision float represented by 64 bits of which 1 bit for the sign, 52 bits for mantissa and 11 bits for the exponent.
I do not understand why the e_max = 2^10 - 1 = 1023
shouldn't it be 2^11 since 11 bits are dedicated to it.
How does it follow from this that the smallest represented float is in the order of 10^(-308) and the largest 10^(308)?
Thanks for any clarification or explanation!
... why the e_max = 2^10 - 1 = 1023 shouldn't it be 2^11 since 11 bits are dedicated to it.
No. Consider negative exponents.
With binary64 encoding, the encoded number has a biased exponent, an 11-bit unsigned integer from 0 to 2047. (Biased exponents of 0 and 2047 have special meaning.)
After applying an offset of -1023, the encoding has an (unbiased) exponent in the range [-1023 ... 1024]. (Again, the end-points have additional special meaning.) This encodes very large values like 10308 and tiny ones like 10-308.
The original design of the encoding of binary64 could have used an 11-bit signed integer for the exponent, in some signed integer encoding, yet that had disadvantages. With a biased exponent we have advantages:
About that time signed integer encoding of 2's complement, 1s' compliment and sign-magnitude machines existed. Selecting one of those 3 for the exponent would disadvantage the other 2.
If we view the entire floating point encoding as a sign-magnitude integer, the order of floating point values from least to greatest is preserved.
The bias selected (-1203) provides for about as many absolute values greater than 1.0 as less than 1.0. The bias selected slightly has more more absolute values > 1.0 than < 1.0. This allows 1.0/max to be represented as non-zero (some sub-normal value). It also allows 1.0/normal_values to not overflow. Using one of the 3 popular signed integer encodings would not provide this unless it too, had a bias.