c gcc casting strict-aliasing type-punning

Strict aliasing and casting union pointers

I have looked around this site to try to figure out if my use of casting to different unions is violating strict aliasing or otherwise UB.

I have packets coming in on a serial line and I store/get them like:

union uart_data {
  struct {
    uint8_t start;
    uint8_t addr;
    uin16_t length;
    uint8_t data[];
  };
  uint8_t bytes[BUFFER_SIZE];
};

void store_byte(uint8_t byte) {
  uart_data->start = byte;
  /* and so on with the other named fields. */
}

uint8_t * get_buffer() {
  return uart_data->bytes;
}

My understanding is that this is, at least with GCC and GNU extensions an valid way to do type punning.

However, I then want to cast the return value from get_buffer() to a more specific type of packet that the uart doesn't need to know the details about.

union spec_pkt {
  struct {
    uint8_t start;
    uint8_t addr;
    uin16_t length;
    uint8_t command;
    uint8_t some_field;
    uint16_t data_length;
    uint8_t data[];
  };
  uint8_t bytes[BUFFER_SIZE];
};

void process(uint8_t *data) {
  union specific_pkt *pkt = (union specific_pkt *)data;
}

I recall having read somewhere that this is valid since I'm casting from a type that exists in the union but I can't find the source.

My rationale for doing this it this way is that I can have a uart driver that only needs to know about the lowest level details. I'm on an MCU so I only have access to pre-allocated buffers to data and this way I don't have to memcpy between buffers, wasting space. And in my application code I can handle the packet in a nicer way than:

uint8_t data[BUFFER_SIZE];

data[START_POS];
data[LEN_POS];
data[DATA_POS];

If this is violating the SA rule or is UB I'd love some alternatives to achieve the same.

I'm using GCC on a target that supports unaligned access and GCC allows type punning through unions.

Solution

The Standard completely fails to specify the circumstances under which a structure or union object may be accessed via a non-character lvalue whose type is not that of the structure or union. If one recognizes that the purpose of the Standard is to purely indicate when a compiler must recognize that an object is being accessed by a seemingly-unrelated lvalue, but is not meant to apply to situations where a compiler would be able to see that an lvalue or pointer of one type is used to derive another which is then used to access storage associated with the first, without any intervening conflicting action on that storage, this omission would make sense. For example, given:

struct sizedPointer { int length,size; int *dat; };
void storeThing(struct sizedPointer *dest, int n)
{
  if (dest->length < dest->size)
  {
    dest->dat[dest->length] = n;
    dest->length++;
  }
}

such an interpretation would allow a compiler to assume that dest->length will not be written using dest->dat, since its value has been observed after dest->dat was formed, but would require that a compiler recognize that given:

union blob { uint16_t hh[8]; uint64_t oo[2]; } myBblob;

an operation like

sscanf(someString, "%4x", &myBlob.hh[1]);

might interact with any lvalues that are derived from myBlob after the function returns.

Unfortunately, gcc and clang instead interpret the rule as only mandating recognition in cases where failure to do so would completely gut the language. Because the Standard doesn't mandate that member-type lvalues be usable in any fashion whatsoever, and gcc and clang have explicitly stated that they should not be relied upon to do anything beyond what the Standard requires, support for anything useful should be viewed as being at the whim of the maintainers of clang and gcc.