Consider the following C++ union:
struct Mystruct
{
char *p; // sizeof(p) == 8, alignof(p) == 8
uint32_t sz; // sizeof(sz) == 4, alignof(sz) == 4
}; // sizeof(MyStruct) == 16 (with padding),
// alignof(MyStruct) == 8
union U
{
MyStruct s;
char buf[16];
};
My question is can I safely read the last element only of buf[] even when buf[] is not the last written to (active) member of the union? I believe that based on the way unions must work in any compiler implementation this element (buf[15]) can never be overwritten by any write to the MyStruct s. Is that correct?
The primary objective is to be as space-efficient as possible while also working reliably.
What I'm doing: This union is the sole data member for a string class (we'll call it MyString). Within that string class the MyStruct struct (actually called AllocInfo) when used contains a pointer to and length of a dynamically allocated block of memory for the string (char* and uint32_t, respectively) when a string larger than 15 characters is stored. buf[] is used for short string optimization (SSO): the string is directly stored in this buffer if it will fit (15 or fewer characters). The last element of this char array is used as a flag. When SSO is used it is set to 0 and also doubles as the terminating null character of the string when the length of the stored string is exactly 15. When dynamically allocated memory is used this flag is set to 1. When I need to read the string I always read this flag first to determine how to access it.
Constraints: I am targeting C++11. Currently I am compiling on AMD64 Linux but I'm attempting to be as platform-agnostic as possible with an eye on compiling for ARM64 Linux at some point soon (the actual size of the buf[] char array is calculated at compile time to be guaranteed to always be at least one byte larger than the struct and also to align to alignof(max_align_t) for the platform). I am linking with no libraries at all. This means I can't use anything in the Standard C or Standard C++ Libraries nor anything in the std:: namespace.
Is there a non-UB way to do what I'm doing while still allowing class instances to be just as space efficient?
Since everything seems to work as is should I just stop overthinking it?
The last 4 bytes of your struct DO technically share storage with the last 4 bytes of the char array, because the struct is padded and its size is 16 bytes. When you write your struct, the compiler is completely free to overwrite those padding bytes with anything. It's also free to not modify them.
As a naive example, consider that the struct could be read into a 128-bit register or two 64-bit registers, modified and then written back to memory. Those last 4 bytes are "don't cares", and the compiler is under no obligation to preserve their value.
You may wish to take control over this by ensuring all bytes of your struct are defined:
struct Mystruct
{
char *p;
uint32_t sz;
char padding[3]; // unused
char flag; // zero if SSO, non-zero otherwise
};
union U
{
MyStruct s;
char buf[16];
};
Now flag
will share the same storage location as buf[15]
, and (ignoring aliasing issues) you can query either.
If you want to take it a step further, consider avoiding aliasing issues by not using the union at all. Just use the struct, because it's not a strict aliasing violation to read/write the data occupied by MyStruct
via a char
pointer.
So you could have this kind of setup, which is essentially a 'manual' union:
struct MyString
{
operator char*() { return flag ? data : (char*)this; }
private:
char *data;
uint32_t sz;
char padding[3]; // unused
char flag; // zero if SSO, non-zero otherwise
};
static_assert(sizeof(MyString) == 16);