Search code examples
cundefined-behaviorunionsstrict-aliasing

How can unions be used to bypass strict aliasing violations?


According to this answer the following code invokes undefined behavior:

uint16_t *buf = malloc(16); // 8*sizeof(uint16_t)
buf[1] = *buf = some_value;
((uint32_t *)buf)[1] = *(uint32_t *)buf;
((uint64_t *)buf)[1] = *(uint64_t *)buf;

We may write any type to malloc() memory but we may not read a previously written value as an incompatible type by casting pointers (with the exeption of char).

Could I use this union:

union Data {
    uint16_t u16[8];
    uint32_t u32[4];
    uint64_t u64[2];
};

As such:

union Data *buf = malloc(16);
buf->u16[1] = buf->u16[0] = some_value;
buf->u32[1] = buf->u32[0];
buf->u64[1] = buf->u64[0];

In order to avoid undefined behavior via strict aliasing violations? Also, could I cast buf to any of uint16_t *, uint32_t *, uint64_t *, and then dereference it without invoking undefined behavior, since these types are all valid members of union Data? (i.e. is the following valid):

uint16_t first16bits = *(uint16_t *)buf;
uint32_t first32bits = *(uint32_t *)buf;
uint64_t first64bits = *(uint64_t *)buf;

If not (i.e. the above code making use of union Data is still invalid), when can and cannot unions be used (in pointer casts or otherwise) to produce valid code that does not violate strict aliasing rules?


Solution

  • Yes, it is acceptable to write one union member and read another. Section 6.5p7 of the C standard states:

    An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

    • a type compatible with the effective type of the object,
    • a qualified version of a type compatible with the effective type of the object,
    • a type that is the signed or unsigned type corresponding to the effective type of the object,
    • a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
    • an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
    • a character type

    It is also safe to convert the address of a union to that of any of its members. From section 6.7.2.1p16:

    The size of a union is sufficient to contain the largest of its members. The value of at most one of the members can be stored in a union object at any time. A pointer to a union object, suitably converted, points to each of its members (or if a member is a bit- field, then to the unit in which it resides), and vice versa