Search code examples
c++ieee-754long-double

What is the correct way to get the binary representation of long double?


Here's my attempt:

#include <iostream>

union newType {
        long double firstPart;
        unsigned char secondPart[sizeof(firstPart)];
} lDouble;

int main() {
    lDouble.firstPart = -16.5;

    for (int_fast16_t i { sizeof(lDouble) - 1 }; i >= 0; --i)
        std::cout << (int)lDouble.secondPart[i] << " ";

    return 0;
}
Output:  0 0 0 0 0 0 192 3 132 0 0 0 0 0 0 0  
Hex:     0 0 0 0 0 0  c0 3  84 0 0 0 0 0 0 0

And I almost agree with the part "c0 3 84", which is "1100 0000 0000 0011 1000 0100".

-16.5 = -1.03125 * 2^4 = (-1 + (-0.5) * 2^-4) * 2^4
Thus, the 117th bit of my fraction part must be 1 and after 5th division I'll get only "0".

sign(-):       1  
exponent(2^4): 4 + 16383 = 16387 = 100 0000 0000 0011  
fraction:      0000 1000 and 104 '0'

Result:   1| 100 0000 0000 0011| 0000 1000 and 104 '0'
Hex:           c    0    0    3     0    8 and 26 '0'

Or: c0 3 8 0 0 0 0 0 0 0 0 0 0 0 0 0

I don' get two things:

  1. "c0 3 84" - where did I lose 4 in my calculations? My guess is that it somehow stores 1 (113 bit) and it shouldn't be stored. Then there's 1000 0100 instead of 0000 1000 (after "c0 3") and that's exactly "84". But we always store 112 bits and 1 is always implicit.
  2. Why doesn't my output start from 192? Why does it start from 0? I thought that first bit is sign bit, then exponent (15 bits) and fraction (112 bits).

I've managed to represent other data types (double, float, unsigned char, etc.). With double I went with the similar approach and got the expected result (e.g. double -16.5 outputs 192 48 128 0 0 0 0 0, or c0 30 80 0 0 0 0 0).

Of course I've tested the solution from How to print binary representation of a long double as in computer memory?

Values for my -16.5 are: 0 0 0 0 0 0 0 0x84 0x3 0xc0 0xe2 0x71 0xf 0x56 0 0  
If I revert this I get:  0 0 56 f 71 e2 c0 3 84 0 0 0 0 0 0 0

And I don't understand why (again) does the sequence start not from sign bit, what are those "56 f 71 e2 c0"? Where do they come from? And why (again) there's "4" after "8"?


Solution

  • What is the correct way to get the binary representation of long double?

    Same as the way of getting the binary representation of any trivial type. Reinterpreting as an array of unsigned char, and iterating each byte is typical and well defined solution.

    std::bitset helps with the binary representation:

    long double ld = -16.5;
    unsigned char* it = reinterpret_cast<unsigned char*>(&ld);
    for (std::size_t i = 0; i < sizeof(ld); i++) {
        std::cout
            << "byte "
            << i
            << '\t'
            << std::bitset<CHAR_BIT>(it[i])
            << '\t'
            << std::hex << int(it[i])
            << '\t'
            << std::dec << int(it[i])
            << '\n';
    }
    

    Example output on some system:

    byte 0  00000000    0   0
    byte 1  00000000    0   0
    byte 2  00000000    0   0
    byte 3  00000000    0   0
    byte 4  00000000    0   0
    byte 5  00000000    0   0
    byte 6  00000000    0   0
    byte 7  10000100    84  132
    byte 8  00000011    3   3
    byte 9  11000000    c0  192
    byte 10 01000000    40  64
    byte 11 00000000    0   0
    byte 12 00000000    0   0
    byte 13 00000000    0   0
    byte 14 00000000    0   0
    byte 15 00000000    0   0
    

    Note that your example has undefined behaviour in C++ due to reading an inactive member of a union.


    Why doesn't my output start from 192?

    Probably because those bytes at the end happen to be padding.

    Why does it start from 0?

    Because the padding contains garbage.

    I thought that first bit is sign bit, then exponent (15 bits) and fraction (112 bits).

    Not so much the "first" bit, but rather the "most significant" bit, excluding the padding. And evidently, you've assumed the number of bits wrongly as some of it is used for padding.

    Note that C++ doesn't guarantee that the floating point representation is IEEE-754 and in fact, long double is often not the 128 bit "quadruple" precision float, but rather 80 bit "extended" precision float. This is the case for example in the x86 CPU architecture family.