Imagine this:
uint64_t x = *(uint64_t *)((unsigned char[8]){'a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'});
I have read that type puns like that are undefined behavior. Why? I am literally, reinterpreting 8 bytes of bytes into an 8 byte integer. I don't see how that's different from a union
except the type pun being undefined behavior and union
s not being? I asked a fellow programmer in person and they said that if you're doing it, either you know what you're doing very well, or you're making a mistake. But the community says that this practice should ALWAYS be avoided? Why?
Ultimately the why is "because the language specification says so". You don't get to argue with that. If that's the way the language is, it's the way it is.
If you want to know the motivation for making it that way, it's that the original C language lacked any way of expressing that two lvalues can't alias one another (and the modern language's restrict
keyword is still barely understood by most users of the language). Being unable to assume two lvalues can't alias means the compiler can't reorder loads and stores, and must actually perform loads and stores from/to memory for every access to an object, rather than keeping values in registers, unless it knows the object's address has never been taken.
C's type-based aliasing rules somewhat mitigate this situation, by letting the compiler assume lvalues with different types don't alias.
Note also that in your example, there's not only type-punning but misalignment. The unsigned char
array has no inherent alignment, so accessing a uint64_t
at that address would be an alignment error (UB for another reason) independent of any aliasing rules.