Search code examples
armsimdneoncortex-a

neon:multiply and accumulate for 64 bit as IP & OP


Is there any way to implement below logic in neon . As I did not find any multiply and accumulate instruction for 64 bit input and output value .

int64x2_t result;
int64x2_t num1;
int64x2_t num2;

>> result + = num1*num2  <<

Solution

  • Technically two 64-bit values could result in a 128-bit result. That's why there are the following int64*int32+int32 functions, but not one that takes two 64-bit input values.

    int64x2_t vmlal_s32 (int64x2_t, int32x2_t, int32x2_t);
    int64x2_t vqdmlal_s32 (int64x2_t, int32x2_t, int32x2_t);
    

    If those don't work for you, then you'll need to use a scalar 64*64 operations followed by vaddq_s64.

    Note: Visual Studio implements _mul128, __umul128, _mulh, and __umulh for all architectures including ARM for handling the full 64*64 = 128 bit scenario.