Search code examples
c++language-lawyerundefined-behaviorconstexprunions

Undefined Behavior in Unions with Standard Layout structs


Take the following code

union vec
{
    struct
    {
        float x, y, z;
    };

    float data[3];

    constexpr vec() : data{} {}
};

constexpr vec make_vec(float x, float y, float z)
{
    vec res;
    res.data[0] = x;
    res.data[1] = y;
    res.z = z;
    return res;
}

int main()
{
    constexpr vec out = make_vec(0, 1, 2);
    std::cout << out.z << '\n';
}

I make use of constexpr here to determine whether the code is undefined behavior or not, as the undefined behavior will cause a compilation error.

§9.2/19:

If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them.

From this, I would assume that everything in the code would be defined behavior.

Compiling with g++ main.cpp -o out -std=c++17, I get the message error: change of the active member of a union from 'vec::data' to 'vec::<anonymous>'.

I thought that to comply with the standard, I might've had to change it to this--

union vec
{
    struct
    {
        float x, y, z;
    };

    struct
    {
        float data[3];
    };

    constexpr vec() : data{} {}
};

But I get the same error.

Is this truly undefined behavior? Is there perhaps another part of the standard that I've missed, or am I simply misinterpreting the standard?


Solution

  • Yes, this is UB.

    After you write to float data[3]; part of the union, you are not allowed to read the struct { float x, y, z; };

    This is as simple as that.

    that share a common initial sequence

    Doesn't cover these two, as an array is not the same as a float followed by another float.

    Important edit

    The answer above assumed the code was UB as the .x and .y members would not be valid. As @user17732522 points out. It is a bit more subtle than that.

    .x and .y are returned uninitialized and would have undefined values. But the write to the .z member indeed sets the active member of the union. As such, as long as the calling code only reads the .z member, everything is defined and correct.