Search code examples
csimdintrinsicsmemory-alignmentneon

VLD2 structure load of a stricter alignment type


Can a byte pointer ever be safely passed to vld2q_u16? I'm mostly concerned about static analyzer complaints.

uint16x8x2_t load_interleaved_shorts (const uint8_t* const ptr) {
    uint16_t* p16 = (uint16_t*)ptr; // possible undefined behavior ?
    return vld2q_u16(p16);
}

In my instance: The pointer is always aligned to a 16 byte boundary. The compiler doesn't known the alignment of the pointer. The code must be portable and strictly follow the C90 standard.

Assumptions: Replacing vld2q_u16 with vld1q_u8 / vuzpq_u8 would hurt performance. The probability of the compiler optimizing a scalar pattern into a vld2q_u16 is small.

Edit: suppressed some warnings by casting to a void pointer. vld2q_u16((const uint16_t*)(const void*)src)


Solution

  • The code must be portable and strictly follow the C90 standard.

    ... plus everything implied by the presence of ARM NEON intrinsics! (Although that may not help a static analyzer). (Related: Is `reinterpret_cast`ing between hardware SIMD vector pointer and the corresponding type an undefined behavior? discusses that for x86, but your case is a bit different; your pointers are aligned).


    In C, it's safe to cast between pointer types (without dereferencing) as long as you never create a pointer with insufficient alignment for its type. You don't need a compile-time-visible guarantee of alignment, you just need to not ever actually create a uint16_t* that doesn't have alignof(uint16_t) alignment.

    (This makes it unlikely for a static analyzer to complain even if that wasn't the case, unless it could see something like (uint16_t*)(1 + (char*)&something_aligned) where you take an aligned address and offset it by an odd number, which would be guaranteed to produce a misaligned address.)

    And in practice, compilers targeting byte-addressable machines do more or less define the behaviour even for creating misaligned pointers. (For example, Intel intrinsics for unaligned loads depend on creating an unaligned __m128i*.) As long as you don't deref them, which is unsafe even in practice on targets that allow unaligned loads; see my answer on this Q&A for an example and the blog links that cover other examples.

    So you're 100% fine: your code never creates a misaligned uint16_t*, and doesn't directly dereference it.

    If ARM has unaligned-load intrinsics, it would even be safe to form a misaligned uint16_t* and pass it to the function; the existence/design of the intrinsics API implies that it's safe to use it that way.


    Other things that are undefined behaviour but which you aren't doing:

    • It's technically UB to form a pointer that isn't pointing inside an object, or one-past-end, but in practice mainstream implementations allow that as well.

    • It's strict-aliasing UB to dereference a uint16_t* that doesn't point to uint16_t objects. But any dereferencing only happens inside intrinsic "functions", so you don't have to worry about the strict-aliasing rule. (Which may pointer-cast to some special type and deref, or may pass the pointer on to a __builtin_arm_whatever() compiler built-in.)

    I assume that ARM load/store intrinsics are defined similar to memcpy, being able to read/write the bytes of any object. So e.g. you could vld2q_u16 on an array of int, double, or char. Intel intrinsics are defined that way (e.g. GCC/clang use __attribute__((may_alias)).) If not, it wouldn't be safe.

    And BTW, the char*-can-alias-anything rule only works one way. Yes it's safe to point a char* at a uint16_t, but if you have an actual array of char buf[100], those objects are definitely char objects, and it's UB to access them through a uint16_t*. However, if you only have char*, and only one other pointer-type other than char* is used, then you can look at the memory as having whatever the other type is, and every char* access aliasing that.