Should we consider for overflow when use Neon intrinsics such as vadd_s8

If we have such C code

spatial_pred= (cur[mrefs] + cur[prefs])>>1;

when transform to Neon intrinsics

int8x8_t cur_mrefs = vld1_s8(cur+mrefs);
int8x8_t cur_prefs = vld1_s8(cur+prefs);
int8x8_t spatial_pred = vshr_n_s8(vadd_s8(cur_mrefs, cur_prefs), 1);

Do we need to consider for overflow when vadd_s8(cur_mrefs, cur_prefs)? Whether we should use vadd_s16 instead?

Solution

If you don't want to lose overflow information, you should first move int8x8_t to int16x8_t then do the summing.

If you want result to saturate then you should use vqadd.

Vector saturating add: vqadd -> Vr[i]:=sat<size>(Va[i]+Vb[i])

If you just want to convert C version you should use vhadd or vrhadd (rounds) which does halving the sum instead of trying to do shift as a second step.

Vector halving add: vhadd -> Vr[i]:=(Va[i]+Vb[i])>>1
Vector rounding halving add: vrhadd -> Vr[i]:=(Va[i]+Vb[i]+1)>>1

why is casting from an unsigned int to a struct that consists of bitfields making up an unsigned int not allowed?
Bare metal spinlock implementation in rust
CPU dependent code: how to avoid function pointers?
arm 32bit instruction "swi" on 64bit cpu
gcc-arm-none-eabi 11.3 "is not implemented and will always fail"
What does this mean: .size _start, . - _start in assembler?
Cross compile arm assembly for x86
How do I cast a vector to a float64_t to check a SIMD compare for all-zero?
How do you startup the additional cores on an Allwinner H5?
float16_t rounding on ARM NEON
How to elegantly support ARM assembly on both MacOS and Android?
Accelerating matrix vector multiplication with ARM Neon Intrinsics on Raspberry Pi 4
arm compiler 5 do not fully respect volatile qualifier
Which variable types/sizes are atomic on STM32 microcontrollers?
sorry, unimplemented: Thumb-1 ‘hard-float’ VFP ABI - arm-linux-gnueabihf-gcc - targeting armv6
ARM NEON vectorization failure
Instrumentation of ARM Binaries
arm-none-eabi-gdb continues instead of stepping over in no-sdk baremetal assembly
gcc arm optimizes away parameters before System Call
ARM inline asm: exit system call with value read from memory
ARM V7 inline assembly - moving a C variable into a register
Message "Unable to run arm-none-eabi-gdb: cannot find libncurses.so.5"
debugserver is x86_64 binary running in translation, attached failed. Could not attach to pid :
EXC_BAD_ACCESS pointing me an arm line of code
GDB: 'set substitute-path' command does not work
How to print constexpr in C23 at compile-time?
Arm Cortex-M7 SAM-E70 x32-ld is keeping both weak and strong function definition
How to choose between compiling ARM assembly file if iOS device and using regular C if iOS simulator
roleAssignment with current user id
CPython as a library for C (To execute Python code from C)