Search code examples
arraysclanguage-lawyerundefined-behavior

Accessing a single value in an array of union/struct with an array member


Suppose I have a union containing an array:

union Set {
    uint64_t z[2];
    uint32_t y[4];
    uint16_t x[8];
};

Now suppose I have an array of these objects:

union Set sets[SIZE] = { /* ... */ };

I can safely access any int16_t in any of the Set objects by indexing sets and then indexing the resulting x field:

uint16_t value = sets[1].x[0];

But is there any way I can safely access the same uint16_t based on the absolute position from the beginning of entire array of set objects? Is the following safe and guaranteed to produce the same value as the above? Assuming the union has no padding.

uint16_t value = ((uint16_t *)sets)[8];

From what I could gather this would not be valid if set were a straight multi-dimensional array. But I am not sure if those rules apply here or not since pointer I am dereferencing derived from the main sets array in which the target value resides so no 'out of bounds' here. But am I wrong or are there other undefined behaviors here?


What if sets was returned by malloc() instead of a declared array? Would the answer be any different?


Solution

  • The most obvious and safest option is to simply break up the index into the Set index and the x index within the set:

    uint16_t value = sets[i >> 3].x[i & 7];
    

    The ((uint16_t *)sets)[8]; method will not be safe because casting sets directly to a uint16_t has the effect of accessing the x array first element leading to the out-of-bounds problem.

    However a comment did suggest an alternative solution. Any object pointer can be safely converted to a char pointer. Apply the offset via the char * and then that to a uint16_t * and dereference. This should be safe for the same reason accessing a struct member via a char * and offsetof() is valid.

    uint16_t value = *(uint16_t *)((char *)sets + i * sizeof(uint16_t));
    

    The only difference between this and the first solution is that this one depends on the union not having any trailing padding but the first one does not.