How to extend a int32x2_t to a int32x4_t with NEON intrinsics on clang/AArch64 when you don't care about the new lanes?

Fellow ARMists,
I'd like to narrow and saturate 2 s32 to 2 s16 with NEON code, and pack them in a GPR. I need to conform to a certain API, so please don't discuss efficiency or design here :)
Here's the snippet:

int32x2_t stuff32 = ...;
int16x4_t stuff16 = vqmovn_s32(vcombine_s32(stuff32, stuff32));
return vget_lane_u32(stuff16, 0)

Which generates

mov    v0.d[1], v0.d[0] 
sqxtn  v0.4h, v0.4s 
fmov   w0, s0 
ret

Does somebody know a way to keep the type system happy, and have the second half of the d register uninitialized ? I'd like to avoid inline assembly.
Thank you !

Solution

I'm not aware of any good solution using general arm_neon.h intrinsics, but at least with Clang, it's possible, using Clang specific builtins, to produe a vector where some elements are set to be undefined, so the codegen doesn't need to fill them with any value in particular.

A setup that uses that would look like this:

$ cat test.c
#include <arm_neon.h>

int32_t narrow_saturate(int32x2_t stuff32) {
  int16x4_t stuff16 = vqmovn_s32(__builtin_shufflevector(stuff32, stuff32, 0, 1, -1, -1));
  return vget_lane_s32(vreinterpret_s32_s16(stuff16), 0);     
}

$ clang -target aarch64-linux-gnu test.c -S -o - -O2
[...]
narrow_saturate:
        sqxtn   v0.4h, v0.4s
        fmov    w0, s0
        ret

https://godbolt.org/z/N_NsSE

See https://clang.llvm.org/docs/LanguageExtensions.html#builtin-shufflevector for documentation on __builtin_shufflevector.

EDIT: It also seems to be possible to achieve the same with Clang by using an uninitialized variable (although that can generate warnings with `-Wuninitialized):

$ cat test.c
#include <arm_neon.h>

int32_t narrow_saturate(int32x2_t stuff32) {
  int32x2_t uninitialized;
  int16x4_t stuff16 = vqmovn_s32(vcombine_s32(stuff32, uninitialized));
  return vget_lane_s32(vreinterpret_s32_s16(stuff16), 0);
}

Clang produces the same as above for this (https://godbolt.org/z/TzHuon), while GCC still includes a mov v0.8b, v0.8b (https://godbolt.org/z/wZTAU9).