In addition, I read these instructions:
ptrue p0.s
ptrue p0.d
ptrue p0.b vl64
ptrue p0.b vl32
So, what are their effects and differences?
I'm new to SVE so my answer may be wrong:
Some background
(Probably you already know that...)
The width of SVE registers differs from one CPU to another so you might run into the following problem:
You write your program for a CPU that allows 3 numbers per register and load the values {10, 20, 30}
to one register and {5, 10, 3}
to another register and perform element-wise division. You expect {10/5, 20/10, 30/3} = {2, 2, 10}
as result.
However, you are running your program on another CPU that allows 5 elements per register, so the second register contains {0, 0, 5, 10, 3}
, so you would get a division by zero (because of the first two elements).
To avoid this situation, SVE uses special "predicate registers" (P0
-P15
) that contain a bit mask that tells the CPU which element in the register is valid and which one is invalid. In the example above, the bitmask shall be {invalid, invalid, valid, valid, valid}
.
Your actual question
So, what are their effects and differences?
ptrue p0.s
This instruction sets the value of the register P0
in a way that a later 32-bit (.s
) operation will process all fields in the SVE register.
"32-bit operation" means: An operation that interprets a 320-bit SVE register as 10 32-bit values.
ptrue p0.d
This instruction sets the value of the register P0
in a way that a later 64-bit (.d
) operation will process all fields in the SVE register.
ptrue p0.b vl64 ptrue p0.b vl32
These instructions will set the value of the register P0
in a way that a later 8-bit (.b
) operation will process the low 64 (vl64
) or 32 (vl32
) bytes of the SVE register.
On a CPU where the SVE registers are less than 512 (vl64
) or 256 (vl32
) bits wide, the corresponding instruction sets the value of P0
to "all elements are invalid" to ensure that nothing stupid happens.