Is there any way to implement below logic in neon . As I did not find any multiply and accumulate instruction for 64 bit input and output value .
int64x2_t result;
int64x2_t num1;
int64x2_t num2;
>> result + = num1*num2 <<
Technically two 64-bit values could result in a 128-bit result. That's why there are the following int64*int32+int32
functions, but not one that takes two 64-bit input values.
int64x2_t vmlal_s32 (int64x2_t, int32x2_t, int32x2_t);
int64x2_t vqdmlal_s32 (int64x2_t, int32x2_t, int32x2_t);
If those don't work for you, then you'll need to use a scalar 64*64 operations followed by vaddq_s64
.
Note: Visual Studio implements _mul128
, __umul128
, _mulh
, and __umulh
for all architectures including ARM for handling the full 64*64 = 128 bit scenario.