Search code examples
cmatlabfloating-pointprecisionieee-754

Why am I losing precision when reading in IEEE-754 floating point from a file encoded in 32-bit binary big endian?


I am re-writing some Matlab file processing code in pure C, and I've implemented the following function that will read 4 bytes from a binary big endian encoded file that should be representing an ieee-754 single precision floating point value. I verified that I'm able to pull the relevent 32-bit data out of the file as an unsigned integer with the following code.

int fread_uint32_be(uint32_t *result, FILE ** fp)
{
    uint8_t data[sizeof(uint32_t)];
    if (!result || !*fp || sizeof(uint32_t) != fread((void *) data, 1, sizeof(uint32_t), *fp))
    {
        return -1;
    }
    *result = ((uint32_t)(data[0]) << 24 | (uint32_t)(data[1]) << 16 |
               (uint32_t)(data[2]) << 8  | (uint32_t)(data[3]));
    return 0;
}

The data I'm expecting has a hex value of 0x1acba506 as returned from this function, and was verified by a hex dump of the data file in big endian format. Now here comes my problem...

When I cast this value from uint32_t to float, I get a single precision floating point value of 449553664.000000 which is close but not exactly what the Matlab code had, which was 449553670.000000. I've verified that when Matlab reads the binary file, it also gets the same hex value 0x1acba506 that my C code has.

When I cast back from float to uint32_t and print the hex value, I end up with 0x1acba500, which shows that I'm losing precision in the simple cast i.e. float ans = (float)result; but I don't really understand why? I'm using gcc 7.4 on an x86 machine, and I've verified that sizeof float == sizeof uint32. Am I making a poor assumption that the compiler is using IEEE-754 single precision floating point?

In debugging, I found an online calculator for floating point that makes it seem like the precision is hopelessly lost, but then the question becomes how is Matlab retaining it?

Here is the number I'm getting by casting to float in my C code

Here is the number that I see in the Matlab code


Solution

  • The mantissa of IEEE 754 single-precision float is 24 bits, where the first bit is implied 1.

    Let's see your two integers - Python is a good tool for debugging these. Their bit representations are

    >>> format(449553664, '032b')
    '00011010110010111010010100000000'
    

    and

    >>> format(449553670, '032b')
    '00011010110010111010010100000110'
    

    Now, if we look into the latter number and see how it would fit into a single-precision mantissa, the first 1 bit is the 4th bit from left, and including that we count 24 bits, and we get

    >>> format(449553670, '032b').lstrip('0')[:24]
    '110101100101110100101000'
    

    Clearly the last 110 did not fit into the mantissa and the value was rounded down. Therefore the value of (float)449553670 is presented as

    1.10101100101110100101000b * 10b ^ 11100b
    

    i.e. in decimal

    1.67471790313720703125 * 2 ^ 28
    

    which equals 449553664.0.


    Matlab most likely retains the precision by not using floats but doubles, just like JavaScript does. All integers with width less than 53 bits can be represented in IEEE 754 double-precision floats.