Search code examples
cstrict-aliasing

Is having a buffer of unsigned char and treating it as a pointer to T* a violation of strict aliasing?


Some sources say the following is a violation of strict aliasing:

#include <stdlib.h>

typedef struct Foo {
    int a;
    double* b;
} Foo;
int main() {

    _Alignas(Foo) unsigned char buffer[2048];
    Foo* a = (Foo*)&buffer[0];
    a->a = 44;
    a->b = NULL;
}

GCC atleast does not throw an error: https://godbolt.org/z/Tbzaodb8W

if this undefined behaviour, how would any allocator be implemented, especially bump allocators that make use of such a unsigned char buffer?

I certainly know that

Foo MyFoo;
...
unsigned char* byteRepresentationOfFoo = (unsigned char*)&MyFoo;

is allowed, since unsigned char* is allowed to alias any type, but what about the reverse with the unsigned char buffer?


Solution

  • Yes it is a strict aliasing violation, undefined behavior. The declared type is unsigned char [2048] so the "effective type" is the same (C17 6.5 §6).

    a->a = etc is a lvalue expression accessing the object using another effective type not compatible with unsigned char. (C17 6.5 §7)

    As a side note it would have been ok to do stuff like this in case Foo was a struct/union containing a character type array among its members (C17 6.5 §7), but in this case it is not.

    GCC atleast does not throw an error

    It's not gcc's job to report undefined behavior. The various strict aliasing warnings have always been pretty broken and gcc is also known to be lax when it comes to warning for non-standard extensions in general. clang doesn't throw any diagnostics either.

    Related: Why did gcc stop warning about strict aliasing violation from version 7.2? The answer is likely "because bugs". In recent times, they are rolling out so many poorly tested changes to the compiler. It took gcc 28 years to get from version 1 to version 5, but from there on it has been one major version release per year up to 13.x... They rolled out 8 major versions from 2015-2023 while there was only 1 minor revision to the actual C language in that time.


    how would any allocator be implemented

    It can't, or at least not by taking a raw character buffer and wildly pointer cast from it. malloc and the like are library functions which may be implemented in non-standard C or another language entirely.

    Now as it happens most standard libs are actually written in C, glibc etc. But if you compile those with a strict C compiler rather than with specialized options like gcc -fno-strict-aliasing, then there are no guarantees. Follow the build instructions of the library implementors.

    You can implement allocators by type punning unions though:

    typedef union
    {
      Foo foo;
      unsigned char buf[n];
    } pun_intended_t;
    
    _Alignas(Foo) pun_intended_t pun = { .buf =  { something } };
    Foo* f = (Foo*)pun.buf; // well-defined as long as aligned
    

    This utilizes another exception from "strict aliasing".


    unsigned char* is allowed to alias any type, but what about the reverse with the unsigned char buffer?

    Rather, any object may be accessed using a character type but not the other way around. This is because of two special rules:

    C17 6.5 §7 ("the strict aliasing rule"):

    An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
    /--/

    • a character type.

    As well as C17 6.3.2.3 §7

    When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.