How can *i
and u.i
print different numbers in this code, even though i
is defined as int *i = &u.i;
? I can only assuming that I'm triggering UB here, but I can't see how exactly.
(ideone demo replicates if I select 'C' as the language. But as @2501 pointed out, not if 'C99 strict' is the language. But then again, I get the problem with gcc-5.3.0 -std=c99
!)
// gcc -fstrict-aliasing -std=c99 -O2
union
{
int i;
short s;
} u;
int * i = &u.i;
short * s = &u.s;
int main()
{
*i = 2;
*s = 100;
printf(" *i = %d\n", *i); // prints 2
printf("u.i = %d\n", u.i); // prints 100
return 0;
}
(gcc 5.3.0, with -fstrict-aliasing -std=c99 -O2
, also with -std=c11
)
My theory is that 100
is the 'correct' answer, because the write to the union member through the short
-lvalue *s
is defined as such (for this platform/endianness/whatever). But I think that the optimizer doesn't realize that the write to *s
can alias u.i
, and therefore it thinks that *i=2;
is the only line that can affect *i
. Is this a reasonable theory?
If *s
can alias u.i
, and u.i
can alias *i
, then surely the compiler should think that *s
can alias *i
? Shouldn't aliasing be 'transitive'?
Finally, I always had this assumption that strict-aliasing problems were caused by bad casting. But there is no casting in this!
(My background is C++, I'm hoping I'm asking a reasonable question about C here. My (limited) understanding is that, in C99, it is acceptable to write through one union member and then reading through another member of a different type.)
The disrepancy is issued by -fstrict-aliasing
optimization option. Its behavior and possible traps are described in GCC documentation:
Pay special attention to code like this:
union a_union { int i; double d; }; int f() { union a_union t; t.d = 3.0; return t.i; }
The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with
-fstrict-aliasing
, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected. See Structures unions enumerations and bit-fields implementation. However, this code might not:int f() { union a_union t; int* ip; t.d = 3.0; ip = &t.i; return *ip; }
Note that conforming implementation is perfectly allowed to take advantage of this optimization, as second code example exhibits undefined behaviour. See Olaf's and others' answers for reference.