Search code examples
cstrict-aliasing

Violating of strict-aliasing in C, even without any casting?


How can *i and u.i print different numbers in this code, even though i is defined as int *i = &u.i;? I can only assuming that I'm triggering UB here, but I can't see how exactly.

(ideone demo replicates if I select 'C' as the language. But as @2501 pointed out, not if 'C99 strict' is the language. But then again, I get the problem with gcc-5.3.0 -std=c99!)

// gcc       -fstrict-aliasing -std=c99   -O2
union
{   
    int i;
    short s;
} u;

int     * i = &u.i;
short   * s = &u.s;

int main()
{   
    *i  = 2;
    *s  = 100;

    printf(" *i = %d\n",  *i); // prints 2
    printf("u.i = %d\n", u.i); // prints 100

    return 0;
}

(gcc 5.3.0, with -fstrict-aliasing -std=c99 -O2, also with -std=c11)

My theory is that 100 is the 'correct' answer, because the write to the union member through the short-lvalue *s is defined as such (for this platform/endianness/whatever). But I think that the optimizer doesn't realize that the write to *s can alias u.i, and therefore it thinks that *i=2; is the only line that can affect *i. Is this a reasonable theory?

If *s can alias u.i, and u.i can alias *i, then surely the compiler should think that *s can alias *i? Shouldn't aliasing be 'transitive'?

Finally, I always had this assumption that strict-aliasing problems were caused by bad casting. But there is no casting in this!

(My background is C++, I'm hoping I'm asking a reasonable question about C here. My (limited) understanding is that, in C99, it is acceptable to write through one union member and then reading through another member of a different type.)


Solution

  • The disrepancy is issued by -fstrict-aliasing optimization option. Its behavior and possible traps are described in GCC documentation:

    Pay special attention to code like this:

          union a_union {
            int i;
            double d;
          };
    
          int f() {
            union a_union t;
            t.d = 3.0;
            return t.i;
          }
    

    The practice of reading from a different union member than the one most recently written to (called “type-punning”) is common. Even with -fstrict-aliasing, type-punning is allowed, provided the memory is accessed through the union type. So, the code above works as expected. See Structures unions enumerations and bit-fields implementation. However, this code might not:

          int f() {
            union a_union t;
            int* ip;
            t.d = 3.0;
            ip = &t.i;
            return *ip;
          }
    

    Note that conforming implementation is perfectly allowed to take advantage of this optimization, as second code example exhibits undefined behaviour. See Olaf's and others' answers for reference.