Search code examples
simdinline-assemblyarm64

How to use multi-vector types in ARM64 inline assembly?


In ARM64 compilers with GCC-like __asm__, how could I make use of multi-vector NEON types like uint8x16x4_t?

uint8x16x4_t Meow()
{
    uint8x16x4_t result;
    __asm__(
        "meow %0"
    :   "=w"(result));
    return result;
}

That results in the following assembly output:

    meow v0

Is there a way to get it to be something like this?:

    meow { v0.16b - v3.16b }

Or even better, refer to the individual parts somehow.


Solution

  • You'll have to do it manually, but you can do so with the T, U and V modifiers. And suffixes can just be specified literally. The following code:

    uint8x16x4_t Meow()
    {
        uint8x16x4_t result;
        __asm__(
            "meow { %0.16b, %T0.16b, %U0.16b, %V0.16b }"
        :   "=w"(result));
        return result;
    }
    

    gives me:

    Meow:
        meow { v4.16b, v5.16b, v6.16b, v7.16b }
        mov     v1.16b, v5.16b
        mov     v2.16b, v6.16b
        mov     v3.16b, v7.16b
        mov     v0.16b, v4.16b
        ret