Say I have an int16x8_t
vector. I want to limit the range of its values to 0-255 and convert it to an uint8x8_t
vector. Reading the vector into an array and doing it the traditional non-intrinsic way is waaaay too slow. Is there a faster way?
All you need is the single instruction vqmovun.s16
, vqmovun_s16
in intrinsics.
Vector Saturating(q) Move Unsigned Narrow
int16x8_t input;
uint8x8_t result;
.
.
.
.
.
.
result = vqmovun_s16(input);
Any negative element will be replaced with 0 while all the numbers bigger than 255 will be set as 255 then narrowed to unsigned 8bit elements, and all these in a single cycle, EXACTLY what you need.
There is also vqmovn_s16
which keeps the values signed (-128~127)
PS: Are you working on YUV to RGB conversion? That's the one time I needed this instruction.