I am interested in whether it is defined behaviour in Standard C++ to implicitly change the active member of a union between two members of the same primitive type.
By 'primitive type' I mean something with a standard layout and trivial constructor/destructor, like an int
, double
, pointer, etc.
For example, suppose I have a union
union X
{
int a;
int b;
};
Then, is the following function legal?
int f(int i)
{
X x;
x.a = i;
return x.b;
};
While on the one hand I understand that the Standard says that every union has at most one non-static data member active at any time, there is also an exception for when one or more structs inside the union are standard layout share a common initial sequence. This seems to be specific to structs, and there is no mention of primitive types. It feels to me as though this should hold also for primitives (because I would've thought that every primitive shares a common initial sequence with itself), but I've not been able to find anything to confirm this.
Related questions:
The snippet you provide has undefined behaviour in C++11, C++14, C++17 and C++20 (I have not inspected other Standard versions, however).
As you mentioned in your question, there are exceptions related to struct
, though there is a however a nuance: you cannot "implicitly change" the active member of an union
, but you are allowed to read from inactive members in restricted circumstances, which slightly varies between c++11 and c++17.
Since no such exception is provided for primitive types, and since the wording defines common initial sequence
only for struct
, this excludes the primitive types de facto.
Here are the specifics per Standard version (emphasis and formatting mine).
C++11
c++11 has [class.mem]#19
which states (emphasis mine):
If a standard-layout union contains two or more standard-layout structs that share a common initial sequence, and if the standard-layout union object currently contains one of these standard-layout structs, it is permitted to inspect the common initial part of any of them. Two standard-layout structs share a common initial sequence if corresponding members have layout-compatible types and either neither member is a bit-field or both are bit-fields with the same width for a sequence of one or more initial members.
A "standard-layout union
" is defined a bit earlier at [class]#9
:
A standard-layout struct is a standard-layout class defined with the class-key struct or the class-key class. A standard-layout union is a standard-layout class defined with the class-key union.
With "standard-layout class
" being defined at [class]#7
:
A standard-layout class is a class that:
- has no non-static data members of type non-standard-layout class (or array of such types) or reference,
- has no virtual functions (10.3) and no virtual base classes (10.1),
- has the same access control (Clause 11) for all non-static data members,
- has no non-standard-layout base classes,
- either has no non-static data members in the most derived class and at most one base class with non-static data members, or has no base classes with non-static data members, and - has no base classes of the same type as the first non-static data member.108
And a "layout-compatible" struct
/union
are respectively defined at [class.mem]#17
/[class.mem]#18
:
(17) Two standard-layout
struct
(Clause 9) types are layout-compatible if they have the same number of non-static
data members and corresponding non-static
data members (in declaration order) have layout-compatible types (3.9). (18) Two standard-layoutunion
(Clause 9) types are layout-compatible if they have the same number of non-static
data members and corresponding non-static
data members (in any order) have layout-compatible types (3.9).
c++11 doesn't seem to provide an exact definition of common initial sequence
, but since your union
is not made of struct
, the exception does not apply.
The rest of the behaviour of union
is described at [class.union]
, but it does not specify much more (see [class.union]#1
).
C++14
For c++14, this is specified under [class.mem]#19
with the following phrasing:
In a standard-layout union with an active member (9.5) of struct type
T1
, it is permitted to read a non-static data memberm
of anotherunion
member ofstruct
typeT2
providedm
is part of the common initial sequence ofT1
andT2
. [ Note: Reading a volatile object through a non-volatile glvalue has undefined behavior (7.1.6.1). - end note ] c++14
The meaning of "standard-layout union
" is this time defined at [class]#8
, but the definition is the same as for c++11.
The definition of "standard layout class
" however gets some nuances, though this doesn't affect the snippet you provide:
A class
S
is a standard-layout class if it:
- (7.1) has no non-
static
data members of type non-standard-layout class (or array of such types) or reference,- (7.2) has no
virtual
functions (10.3) and novirtual
base classes (10.1),- (7.3) has the same access control (Clause 11) for all non-
static
data members,- (7.4) has no non-standard-layout base classes,
- (7.5) has at most one base class subobject of any given type,
- (7.6) has all non-
static
data members and bit-fields in the class and its base classes first declared in the sameclass
, and- (7.7) has no element of the set
M(S)
of types (defined below) as a base class.108M(X)
is defined as follows:- (7.8) If
X
is a non-union
class type, the setM(X)
is empty ifX
has no (possibly inherited (Clause 10)) non-static
data members; otherwise, it consists of the type of the first non-static
data member ofX
(where said member may be an anonymousunion
),X0
, and the elements ofM(X0)
.- (7.9) If X is a union type, the set
M(X)
is the union of allM(Ui)
and the set containing allUi
, where eachUi
is the type of the ith non-static
data member ofX
.- (7.10) If X is a non-class type, the set
M(X)
is empty. [ Note:M(X)
is the set of the types of all non-base-class subobjects that are guaranteed in a standard-layout class to be at a zero offset inX
- end note ]
No more exceptions are added (and the meaning of common initial sequence
isn't still clearly defined for that matters).
[class.union]#1
didn't change either, so the exception don't apply to your example.
C++17
c++17 goes a bit more into the specifics and provides an example at [class.mem]#24
. This clause states:
In a standard-layout
union
with an active member (12.3) ofstruct
typeT1
, it is permitted to read a non-static data memberm
of another union member ofstruct
typeT2
providedm
is part of the common initial sequence ofT1
andT2
; the behavior is as if the corresponding member ofT1
were nominated. [ Example:
struct T1 { int a, b; };
struct T2 { int c; double d; };
union U { T1 t1; T2 t2; };
int f() {
U u = { { 1, 2 } }; // active member is t1
return u.t2.c; // OK, as if u.t1.a were nominated
}
- end example ] [ Note: Reading a volatile object through a non-volatile glvalue has undefined behavior (10.1.7.1). - end note ]
c++17 explicitly defines what "common initial sequence" means at [class.mem]#21
:
The common initial sequence of two standard-layout
struct
(Clause 12) types is the longest sequence of non-static
data members and bit-fields in declaration order, starting with the first such entity in each of thestruct
s, such that corresponding entities have layout-compatible types and either neither entity is a bit-field or both are bit-fields with the same width. [ Example:
struct A { int a; char b; };
struct B { const int b1; volatile char b2; };
struct C { int c; unsigned : 0; char b; };
struct D { int d; char b : 4; };
struct E { unsigned int e; char b; };
The common initial sequence of
A
andB
comprises all members of either class. The common initial sequence ofA
andC
and ofA
andD
comprises the first member in each case. The common initial sequence ofA
andE
is empty. — end example ]
c++17 gives an explicit example. It HAS to be wrapped in a struct
. So your example can be given defined behaviour by simply changing it to something like
union X
{
struct integer{int value;};
integer a;
integer b;
};
// Or the more succint yet less expressive
union Y
{
struct {int value;} a, b;
};
C++20
c++20 does not change the behaviour, and has the same examples as c++17 but at [class.mem#25]
, giving it the same behaviour as in c++17 in c++20.