Is strict aliasing still something to think about in C?

I recently read a well-known article by Mike Acton about strict aliasing and how we should use it to increase performance significantly in C code.

It seems to be simple, in some cases, that if you tell your compiler that there will not be two ways to access to your data, then the compiler can optimize the code better. However, to xp on the subject and understand its subtleties, I used godbolt...

It turned out that the following code does exactly what we expect from it intuitively, since gcc 4.7. Tell me if I'm wrong, but until that version, it doesn't seem to change anything to add -fstrict-aliasing or not with -O3.

uint32_t
test(uint32_t arg)
{
  char*     const cp = (char*)&arg;
  uint16_t* const sp = (uint16_t*)cp;

  sp[0] = 0x1;
  sp[1] = 0x1;

  return (arg);
}

That is directly an example taken from the article I mentioned. And in the article, it is explained that gcc considers cp and sp to be two differents objects due to the strict aliasing rule. So, it just leaves arg unchanged. That is what happened in older versions of gcc, if I refer to godbolt. But not anymore. Has gcc changed something about the strict aliasing rule in its 4th version? Is it described somewhere? Or am I wrong?

I also checked the following code, and again, strict aliasing or not, it doesn't impact the result. Even using the restrict keyword. I hope to understand correctly what this means.

void my_loop(int *n, int x)
{
    while (--x)
        printf("%d", *n);
}

From that piece of code, I was expecting to see the compiler loading n once, and use the value for each iteration. Instead, I noticed that n is de-referenced each time I print. Did I miss something?

Solution

It is what happened in older versions of gcc, if I refer to godbolt. But not anymore. Does gcc changed something about the strict aliasing rule in its 4th version? Is it described somewhere? Or am I wrong?

No, nothing has changed. It is undefined behaviour (UB) and the compiler is not obliged to behave in a particular way. It is exactly what you observe.

You can achieve the same level of optimization without using pointer punning and invoking undefined behaviour:

uint32_t test1(uint32_t arg)
{
    union 
    {
        uint32_t arg;
        uint16_t arg2[2];
    }c = {.arg = arg};

    c.arg2[0] = 0x1;
    c.arg2[1] = 0x1;
    return (c.arg);
}

uint32_t test2(uint32_t arg)
{
    unsigned char *ptr = &arg;
    memcpy(ptr, (uint16_t[]){1}, sizeof(uint16_t));
    memcpy(ptr + 2, (uint16_t[]){1}, sizeof(uint16_t));
    return arg;
}

https://godbolt.org/z/nM3rEKocr

Your second example is a valid C code.