i tried to improve some code, but it seems so difficult to me. i develop on Android NDK. the C++ code i want to improve followed:
unsigned int test_add_C(unsigned int *x, unsigned int *y) {
unsigned int result = 0;
for (int i = 0; i < 8; i++) {
result += x[i] * y[i];
}
return result;
}
and neon code:
unsigned int test_add_neon(unsigned *x, unsigned *y) {
unsigned int result;
__asm__ __volatile__(
"vld1.32 {d2-d5}, [%[x]] \n\t"
"vld1.32 {d6-d9}, [%[y]]! \n\t"
"vmul.s32 d0, d2, d6 \n\t"
"vmla.s32 d0, d3, d7 \n\t"
"vmla.s32 d0, d4, d8 \n\t"
"vmla.s32 d0, d5, d9 \n\t"
"vpadd.s32 d0, d0 \n\t"
"vmov %0, r4, d0 \n\t"
:"=r"(result)
:"r"(x)
:"d0", "d2", "d3", "d4", "d5", "d6", "d7", "d8", "d9", "r4"
);
return result;
}
but when i compile the code, it msg that undefined named operand 'x' and 'y'. i dont know how to load the data from array x and y. someone can help me? thanks a lot.
Variable names inside inline assembly can't be "seen" by the compiler, and must be included in the input/output operands list.
Changing the line
:"r"(x)
to
:[x]"r"(x),[y]"r"(y)
will fix your 'undefined named operand' problem. However, I see a few more potential issues right away.
First, the datatype s32
of your multiplication instructions should be u32
, since you specify x
and y
are of unsigned int
type.
Second, you post-increment y
but not x
in the lines
"vld1.32 {d2-d5}, [%[x]] \n\t"
"vld1.32 {d6-d9}, [%[y]]! \n\t"
Unless this is on purpose, it is better to be consistent.