In the C standard (specifically, n3220, ie. the latest working draft of C23), which paragraphs would make the following code examples invoke undefined behaviour?:
void f(int *a, char *b) {
(*a)++;
*a += *b;
}
int main() {
union u {
int a;
char b;
} x;
x.a = 1;
f(&x.a, &x.b);
}
union u {
int a;
char b;
};
void f(int *a, union u *x) {
(*a)++;
*a += x->b;
}
int main() {
union u x;
x.a = 1;
f(&x.a, &x);
}
For the first example: I have read through the standard, and it seems that everything up to the function call should be strictly conformant to the standard, and that there should be a problem with accessing the object representation of a union with l-values of incompatible type where the l-values cannot be traced back to the given union. Or perhaps is the restriction that "type punning" (as referred to by footnote 93) only works when the l-value is the direct result of the . operator? This would be the case if the term "member" in footnote 93 is taken to not be referring to the associated member object; but the standard is not (to my knowledge) very clear about any such distinction. 6.5.1p7 might also support this conclusion since it talks about accessing the stored value, but then that would seem to contradict footnote 93 (since it would seem to prohibit type punning under all circumstances).
For the second example: If the latter reason given above is true, then, what would make this example invoke undefined behaviour? Or would it be strictly conformant, requiring the compiler assume that the union object representation may be aliased by the int *
?
Both examples are undefined behavior since the memory locations overlap. None of that is related to strict aliasing. It is UB as per C23 6.5.17.2:
If the value being stored in an object is read from another object that overlaps in any way the storage of the first object, then the two objects shall occupy exactly the same storage and shall have qualified or unqualified versions of a compatible type; otherwise, the behavior is undefined.
Furthermore it may also cause problems in case the representations do not match. char
may or may not be unsigned, and may or may not overwrite the sign bit of the int
. Not really UB but implementation-defined behavior, both in terms of signedness of char as well as endianness. There's also potentially unspecified behavior as per 6.2.6.1:
[Normative text]
Where an operator is applied to a value that has more than one object representation, which object representation is used shall not affect the value of the result.44) Where a value is stored in an object using a type that has more than one object representation for that value, it is unspecified which representation is used, but a non-value representation shall not be generated.
[Informative foot note:]
44) It is possible for objects x and y with the same effective type T to have the same value when they are accessed as objects of type T, but to have different values in other contexts. In particular, if == is defined for type T, then x == y does not imply that memcmp(&x, &y, sizeof(T))== 0. Furthermore, x == y does not necessarily imply that x and y have the same value; other operations on values of type T can distinguish between them.
But there aren't really any concerns about strict aliasing since no effective type is referred to through an incorrect lvalue access. Futhermore, an int*
and a char*
alias: the compiler cannot assume that a modification to what the int*
pointed at didn't result in an update of what the char*
pointed at and vice versa. This is because a char*
may point at any object.
(Also, a union that include a compatible type of the effective type of the object may be used to lvalue access said object, as far as strict aliasing is concerned. Doesn't apply to these examples.)
However, accessing a char[]
through a de-referenced int*
would be a strict aliasing violation and possibly also a misaligned access.
Specifically:
and that there should be a problem with accessing the object representation of a union with l-values of incompatible type where the l-values cannot be traced back to the given union
As per the quote from 6.2.6.1, there could be multiple object representations here and the requirement is that at least one should be valid. Whether it can get "traced back" or not doesn't matter, what matters is which effective type the actual data got. In this case int
and char
both, at the same time.