Search code examples
c++unionsreinterpret-cast

C++ unions vs. reinterpret_cast


It appears from other StackOverflow questions and reading §9.5.1 of the ISO/IEC draft C++ standard standard that the use of unions to do a literal reinterpret_cast of data is undefined behavior.

Consider the code below. The goal is to take the integer value of 0xffff and literally interpret it as a series of bits in IEEE 754 floating point. (Binary convert shows visually how this is done.)

#include <iostream>
using namespace std;

union unionType {
    int myInt;
    float myFloat;
};

int main() {

    int i = 0xffff;

    unionType u;
    u.myInt = i;

    cout << "size of int    " << sizeof(int) << endl;
    cout << "size of float  " << sizeof(float) << endl;

    cout << "myInt          " << u.myInt << endl;
    cout << "myFloat        " << u.myFloat << endl;

    float theFloat = *reinterpret_cast<float*>(&i);
    cout << "theFloat       " << theFloat << endl;

    return 0;
}

The output of this code, using both GCC and clang compilers is expected.

size of int    4
size of float  4
myInt          65535
myFloat        9.18341e-41
theFloat       9.18341e-41

My question is, does the standard actually preclude the value of myFloat from being deterministic? Is the use of a reinterpret_cast better in any way to perform this type of conversion?

The standard states the following in §9.5.1:

In a union, at most one of the non-static data members can be active at any time, that is, the value of at most one of the non-static data members can be stored in a union at any time. [...] The size of a union is sufficient to contain the largest of its non-static data members. Each non-static data member is allocated as if it were the sole member of a struct. All non-static data members of a union object have the same address.

The last sentence, guaranteeing that all non-static members have the same address, seems to indicate the use of a union is guaranteed to be identical to the use of a reinterpret_cast, but the earlier statement about active data members seems to preclude this guarantee.

So which construct is more correct?

Edit: Using Intel's icpc compiler, the above code produces even more interesting results:

$ icpc union.cpp
$ ./a.out
size of int    4
size of float  4
myInt          65535
myFloat        0
theFloat       0

Solution

  • The reason it's undefined is because there's no guarantee what exactly the value representations of int and float are. The C++ standard doesn't say that a float is stored as an IEEE 754 single-precision floating point number. What exactly should the standard say about you treating an int object with value 0xffff as a float? It doesn't say anything other than the fact it is undefined.

    Practically, however, this is the purpose of reinterpret_cast - to tell the compiler to ignore everything it knows about the types of objects and trust you that this int is actually a float. It's almost always used for machine-specific bit-level jiggery-pokery. The C++ standard just doesn't guarantee you anything once you do it. At that point, it's up to you to understand exactly what your compiler and machine do in this situation.

    This is true for both the union and reinterpret_cast approaches. I suggest that reinterpret_cast is "better" for this task, since it makes the intent clearer. However, keeping your code well-defined is always the best approach.