Search code examples
carmsimdintrinsicsneon

ARM NEON Intrinsics: Limit values of a vector to 0-255


Say I have an int16x8_t vector. I want to limit the range of its values to 0-255 and convert it to an uint8x8_t vector. Reading the vector into an array and doing it the traditional non-intrinsic way is waaaay too slow. Is there a faster way?


Solution

  • All you need is the single instruction vqmovun.s16, vqmovun_s16 in intrinsics.

    Vector Saturating(q) Move Unsigned Narrow

    int16x8_t input;
    uint8x8_t result;
    .
    .
    .
    .
    .
    .
    
    result = vqmovun_s16(input);
    

    Any negative element will be replaced with 0 while all the numbers bigger than 255 will be set as 255 then narrowed to unsigned 8bit elements, and all these in a single cycle, EXACTLY what you need.

    There is also vqmovn_s16 which keeps the values signed (-128~127)

    PS: Are you working on YUV to RGB conversion? That's the one time I needed this instruction.