Search code examples
assemblybinaryfloating-pointieee-754

How to convert binary floating points to decimal fractions?


I'm stuck on a homework assignment; I need to convert a binary float to a decimal fraction. I feel like I understand the process, but I'm not getting the right answer. Here's my thought process.

I have the binary float: 0 000 101

  • The bias for a 3-bit exponent field is 3: 2^(3-1)-1 = 3
  • The mantissa becomes 1.101 (base 2)
  • The value of the exponent bits, 0, minus the number of exponent bits, 3, is -3, so the decimal of the mantissa gets moved left 3 places
    0.001101
  • In base-10, that is 2^-3 + 2^-4 + 2^-6, which equals 0.203125 or 13/64.

However, 13/64 is not the correct answer, the auto-grader doesn't accept it. If my answer is wrong, then I don't understand why, and I'm hoping someone can point me in the right direction.

By pure luck I guessed 5/32 as the answer and got it correct; I have no idea why that's the case.


Solution

  • In IEEE-754 floating-point formats, exponent = 0 is a denormal, where the implied leading bit in the mantissa is 0.

    Wikipedia has a good detailed article on the single-precision float (binary32) format, with lots of examples. For binary32 float, the formulas are (from the wiki article):

    (−1)^signbit × 2^(−126)        × 0.significandbits   ; denormal, expbits=0
    (−1)^signbit × 2^(expbits−127) × 1.significandbits   ; normal
     Inf  or  NaN (depending on mantissa aka significant); expbits = all 1s
    

    (Note that 0.0 is a special case of denormal, but is not actually considered a denormal).

    Anyway, with zero exponent, notice that the exponent is no longer expbits - bias, it's one higher.


    Back to your case: your mantissa is 0.101 binary, 0.625 decimal (I plugged 0b101 / 8 into calc).

    2^-2 * 0.101(binary) = 2^-2 * 0.625(decimal) = 0.15625 = 5/32


    There's a https://en.wikipedia.org/wiki/Minifloat wikipedia article, which mentions (with examples) an 8-bit IEEE format, as well as some other less-than-32-bit formats used in real life on computer-graphics hardware. (e.g. 24-bit or 16-bit). Fun fact: x86 can load/store vectors of 16-bit half-precision floats, converting to/from single in registers on the fly with the F16C ISA extension.

    See also this online converter with check-boxes for bits: https://www.h-schmidt.net/FloatConverter/IEEE754.html