Search code examples
cunionsstrict-aliasingtype-punning

Aliasing through unions


The 6.5(p7) has a bullet about unions and aggregates:

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:

[...]

— an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or

This is not quite clear what it means. Does it require at least one member or all members to satisfy the strict aliasing rule. Particularly about unions:

union aliased{
    unsigned char uint64_repr[sizeof(uint64_t)];
    uint64_t value;
};

int main(int args, const char *argv[]){
    uint64_t some_random_value = 123;
    union aliased alias;
    memcpy(&(alias.uint64_repr), &some_random_value, sizeof(uint64_t));
    printf("Value = %" PRIu64 "\n", alias.value);
}

DEMO

Is the behavior of the program well-defined? If no, what does the bullet mean?


Solution

  • What is means is using a union is one of the standard compliant ways to avoid type punning and the strict aliasing violation that would otherwise occur if you attempted to access a stored value through a pointer of a different type.

    Take for example unsigned and float, generally both 32-bits and in certain cases looking at the stored value from either unsigned* or float* may be needed. You cannot for example do:

        float f = 3.3;
        // unsigned u = *(unsigned *)&f;  /* violation */
    

    Following 6.5(p7) you can use a union between both types and access the same information as either unsigned or float without type-punning a pointer or running afoul of the strict aliasing rule, e.g.

    typedef union {
        float f;
        unsigned u;
    } f2u;
    ...    
        float f = 3.3;
        // unsigned u = *(unsigned *)&f;  /* violation */
        f2u fu = { .f = f };
        unsigned u = fu.u;                /* OK - no violation */
    

    So the strict aliasing rule prevents accessing memory with an effective-type through a pointer of another type, unless that pointer is char type or a pointer to a member of a union between the two types.

    (note: that section of the standard is one that is anything but an example of clarity. (you can read it 10 times and still scratch your head) Its intent is to curb the abuse of pointer types, while still recognizing that a block of memory in any form must be capable of being accessed through a character type, (and a union is among the other allowable manners of access).)

    Compilers have gotten much better in the past few years at flagging violations of the rule.