Search code examples
c++language-lawyerunions

Implicitly changing the active union member between two members of the same primitive type


I am interested in whether it is defined behaviour in Standard C++ to implicitly change the active member of a union between two members of the same primitive type.

By 'primitive type' I mean something with a standard layout and trivial constructor/destructor, like an int, double, pointer, etc.

For example, suppose I have a union

union X
{
   int a;
   int b;
};

Then, is the following function legal?

int f(int i)
{
    X x;
    x.a = i;
    return x.b;
};

While on the one hand I understand that the Standard says that every union has at most one non-static data member active at any time, there is also an exception for when one or more structs inside the union are standard layout share a common initial sequence. This seems to be specific to structs, and there is no mention of primitive types. It feels to me as though this should hold also for primitives (because I would've thought that every primitive shares a common initial sequence with itself), but I've not been able to find anything to confirm this.

Related questions:

  • This question considers whether you can implicitly change the active member when you have a struct and a primitive (not two primitives),
  • Similarly, this question deals with the case where you have one struct and one primitive,
  • This question affirms that I can do exactly what I'm suggesting here in C, but of course I would like to know whether this applies to C++.

Solution

  • The snippet you provide has undefined behaviour in C++11, C++14, C++17 and C++20 (I have not inspected other Standard versions, however).

    As you mentioned in your question, there are exceptions related to struct, though there is a however a nuance: you cannot "implicitly change" the active member of an union, but you are allowed to read from inactive members in restricted circumstances, which slightly varies between and . Since no such exception is provided for primitive types, and since the wording defines common initial sequence only for struct, this excludes the primitive types de facto.

    Here are the specifics per Standard version (emphasis and formatting mine).


    C++11

    has [class.mem]#19 which states (emphasis mine):

    If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.

    A "standard-layout union" is defined a bit earlier at [class]#9 :

    A standard-layout struct is a standard-layout class defined with the class-key struct or the class-key class. A standard-layout union is a standard-layout class defined with the class-key union.

    With "standard-layout class" being defined at [class]#7:

    A standard-layout class is a class that:

    • has no non-static data members of type non-standard-layout class (or array of such types) or reference,
    • has no virtual functions (10.3) and no virtual base classes (10.1),
    • has the same access control (Clause 11) for all non-static data members,
    • has no non-standard-layout base classes,
    • either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and - has no base classes of the same type as the first non-static data member.108

    And a "layout-compatible" struct/union are respectively defined at [class.mem]#17/[class.mem]#18:

    (17) Two standard-layout struct (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in declaration order) have layout-compatible types (3.9). (18) Two standard-layout union (Clause 9) types are layout-compatible if they have the same number of non-static data members and corresponding non-static data members (in any order) have layout-compatible types (3.9).

    doesn't seem to provide an exact definition of common initial sequence, but since your union is not made of struct, the exception does not apply. The rest of the behaviour of union is described at [class.union], but it does not specify much more (see [class.union]#1).


    C++14

    For , this is specified under [class.mem]#19 with the following phrasing:

    In a standard-layout union with an active member (9.5) of struct type T1, it is permitted to read a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2. [ Note: Reading a volatile object through a non-volatile glvalue has undefined behavior (7.1.6.1). - end note ]

    The meaning of "standard-layout union" is this time defined at [class]#8, but the definition is the same as for .

    The definition of "standard layout class" however gets some nuances, though this doesn't affect the snippet you provide:

    A class S is a standard-layout class if it:

    • (7.1) has no non-static data members of type non-standard-layout class (or array of such types) or reference,
    • (7.2) has no virtual functions (10.3) and no virtual base classes (10.1),
    • (7.3) has the same access control (Clause 11) for all non-static data members,
    • (7.4) has no non-standard-layout base classes,
    • (7.5) has at most one base class subobject of any given type,
    • (7.6) has all non-static data members and bit-fields in the class and its base classes first declared in the same class, and
    • (7.7) has no element of the set M(S) of types (defined below) as a base class.108 M(X) is defined as follows:
    • (7.8) If X is a non-union class type, the set M(X) is empty if X has no (possibly inherited (Clause 10)) non-static data members; otherwise, it consists of the type of the first non-static data member of X (where said member may be an anonymous union), X0, and the elements of M(X0).
    • (7.9) If X is a union type, the set M(X) is the union of all M(Ui) and the set containing all Ui , where each Ui is the type of the ith non-static data member of X.
    • (7.10) If X is a non-class type, the set M(X) is empty. [ Note: M(X) is the set of the types of all non-base-class subobjects that are guaranteed in a standard-layout class to be at a zero offset in X - end note ]

    No more exceptions are added (and the meaning of common initial sequence isn't still clearly defined for that matters). [class.union]#1 didn't change either, so the exception don't apply to your example.


    C++17

    goes a bit more into the specifics and provides an example at [class.mem]#24. This clause states:

    In a standard-layout union with an active member (12.3) of struct type T1, it is permitted to read a non-static data member m of another union member of struct type T2 provided m is part of the common initial sequence of T1 and T2; the behavior is as if the corresponding member of T1 were nominated. [ Example:

    struct T1 { int a, b; };
    struct T2 { int c; double d; };
    union U { T1 t1; T2 t2; }; 
    int f() { 
        U u = { { 1, 2 } }; // active member is t1 
        return u.t2.c; // OK, as if u.t1.a were nominated 
    }
    
    • end example ] [ Note: Reading a volatile object through a non-volatile glvalue has undefined behavior (10.1.7.1). - end note ]

    explicitly defines what "common initial sequence" means at [class.mem]#21:

    The common initial sequence of two standard-layout struct (Clause 12) types is the longest sequence of non-static data members and bit-fields in declaration order, starting with the first such entity in each of the structs, such that corresponding entities have layout-compatible types and either neither entity is a bit-field or both are bit-fields with the same width. [ Example:

    struct A { int a; char b; }; 
    struct B { const int b1; volatile char b2; };
    struct C { int c; unsigned : 0; char b; };
    struct D { int d; char b : 4; };
    struct E { unsigned int e; char b; };
    

    The common initial sequence of A and B comprises all members of either class. The common initial sequence of A and C and of A and D comprises the first member in each case. The common initial sequence of A and E is empty. — end example ]

    gives an explicit example. It HAS to be wrapped in a struct. So your example can be given defined behaviour by simply changing it to something like

    union X
    {
        struct integer{int value;};
        
        integer a;
        integer b;
    };
    
      
    // Or the more succint yet less expressive
    union Y
    {
        struct {int value;} a, b;
    };
    

    C++20

    does not change the behaviour, and has the same examples as but at [class.mem#25], giving it the same behaviour as in in .