Search code examples
assemblyarmsve

In ARMV8, what is the assembly instruction "ptrue p0.b vl64" effect?


In addition, I read these instructions:

ptrue p0.s
ptrue p0.d
ptrue p0.b vl64
ptrue p0.b vl32

So, what are their effects and differences?


Solution

  • I'm new to SVE so my answer may be wrong:

    Some background

    (Probably you already know that...)

    The width of SVE registers differs from one CPU to another so you might run into the following problem:

    You write your program for a CPU that allows 3 numbers per register and load the values {10, 20, 30} to one register and {5, 10, 3} to another register and perform element-wise division. You expect {10/5, 20/10, 30/3} = {2, 2, 10} as result.

    However, you are running your program on another CPU that allows 5 elements per register, so the second register contains {0, 0, 5, 10, 3}, so you would get a division by zero (because of the first two elements).

    To avoid this situation, SVE uses special "predicate registers" (P0-P15) that contain a bit mask that tells the CPU which element in the register is valid and which one is invalid. In the example above, the bitmask shall be {invalid, invalid, valid, valid, valid}.

    Your actual question

    So, what are their effects and differences?

    ptrue p0.s
    

    This instruction sets the value of the register P0 in a way that a later 32-bit (.s) operation will process all fields in the SVE register.

    "32-bit operation" means: An operation that interprets a 320-bit SVE register as 10 32-bit values.

    ptrue p0.d
    

    This instruction sets the value of the register P0 in a way that a later 64-bit (.d) operation will process all fields in the SVE register.

    ptrue p0.b vl64
    ptrue p0.b vl32
    

    These instructions will set the value of the register P0 in a way that a later 8-bit (.b) operation will process the low 64 (vl64) or 32 (vl32) bytes of the SVE register.

    On a CPU where the SVE registers are less than 512 (vl64) or 256 (vl32) bits wide, the corresponding instruction sets the value of P0 to "all elements are invalid" to ensure that nothing stupid happens.