Search code examples
floating-pointhexieee-754

32-bit IEEE 754 single precision floating point to hexadecimal


I have learnt how to convert numbers to floating point (on top of binary, octal and hexadecimal), and know how to convert numbers to floating point.

However, while looking through a worksheet I have been given, I have encountered the following question:

Using 32-bit IEEE 754 single precision floating point show the representation of -12.13 in Hexadecimal.

I have tried looking at the resources I have and still can't figure out how to answer the above. The answer given is 0xc142147b.

Edit: Sorry for not clarifying but I wanted to know how to get this done by hand instead of coding it.


Solution

  • -12.13 must be converted to binary and then hex. Let's do that more or less like the glibc library does it, using just pen and paper and the Windows calculator.

    Remove the sign, but remember we had one: 12.13

    Significand (or mantissa)

    The integer part, 12 is easy: C (hex)

    The fractional part, 0.13 is a little trickier. 0.13 is 13/100. I use the Windows calculator (Programmer mode, hex) and shift 13 (hex D) by 32(*) bits to the left: D00000000. Divide that by 100 (hex 64) to get: 2147AE14 hex.

    Since we need a value below 1, we shift right by 32 bits again, and get: 0.2147AE14

    Now add the integer part on the left: C.2147AE14

    We only need 24 bits for the mantissa, so we round: C.2147B --> C2147B

    Now this must be normalized, so the binary point is moved 3 bits to the left (but the bits remain the same, of course). The exponent (originally 0) is raised accordingly, by 3, so now it is 3.

    The hidden bit can now be removed: 42147B (now the 23 low bits)

    This can be turned into a 32 bit value for now: 0x0042147B

    Exponent and sign

    Now let's take on the exponent: 3 + bias of hex 7F = hex 82, or 1000 0010 binary.

    Add the sign bit on the left: 1 1000 0010. Regrouped: 1100 0001 0 or C10

    Of course these are top bits, so we turn that into 0xC1000000 for the full 32 bits

    "Bitwise-Or" both parts

    0xC100000 | 0x0042147B = 0xC142147B
    

    And that is the value you want.


    (*)32 bits so I have more than enough bits to be able to round properly, later on.