Search code examples
neon

assembly asm code, how to load data from different source points?


i tried to improve some code, but it seems so difficult to me. i develop on Android NDK. the C++ code i want to improve followed:

unsigned int test_add_C(unsigned int *x, unsigned int *y) {
unsigned int result = 0;
for (int i = 0; i < 8; i++) {
    result += x[i] * y[i];
}
return result;

}

and neon code:

unsigned int test_add_neon(unsigned *x, unsigned *y) {

unsigned int result;
__asm__ __volatile__(
        "vld1.32    {d2-d5}, [%[x]] \n\t"
        "vld1.32    {d6-d9}, [%[y]]!    \n\t"
        "vmul.s32   d0, d2, d6      \n\t"
        "vmla.s32   d0, d3, d7      \n\t"
        "vmla.s32   d0, d4, d8      \n\t"
        "vmla.s32   d0, d5, d9      \n\t"
        "vpadd.s32  d0, d0          \n\t"
        "vmov       %0, r4, d0      \n\t"
        :"=r"(result)
        :"r"(x)
        :"d0", "d2", "d3", "d4", "d5", "d6", "d7", "d8", "d9", "r4"
);
return result;

}

but when i compile the code, it msg that undefined named operand 'x' and 'y'. i dont know how to load the data from array x and y. someone can help me? thanks a lot.


Solution

  • Variable names inside inline assembly can't be "seen" by the compiler, and must be included in the input/output operands list.

    Changing the line

    :"r"(x)
    

    to

    :[x]"r"(x),[y]"r"(y)
    

    will fix your 'undefined named operand' problem. However, I see a few more potential issues right away.

    First, the datatype s32 of your multiplication instructions should be u32, since you specify x and y are of unsigned int type.

    Second, you post-increment y but not x in the lines

    "vld1.32    {d2-d5}, [%[x]]     \n\t"
    "vld1.32    {d6-d9}, [%[y]]!    \n\t"
    

    Unless this is on purpose, it is better to be consistent.