I see on the Intel intrinsics guide that you can use vpcmpb
without an immediate to achieve the effect of equality comparison: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#techs=AVX_512&expand=6816,804,804,4867,351,804,4222,914&text=vpcmpb
I try to write the following assembly instruction: vpcmpb %zmm30, %zmm0, %k1
(g++ syntax), compare equal zmm30
and zmm0
, write result to k1
. However, the assembler complains about wrong number of operands. What is going on here?
There are 3 valid machine opcodes for doing this:
vpcmpeqb k, zmm, zmm
(EVEX form of the MMX/SSE2/AVX2 66 0F 74
opcode for [v]pcmpeq [xy]mm, [xy]mm
. These have never taken an immediate, with only eq
and signed gt
predicates being available as different opcodes)
vpcmpb
or vpcmpub
with immediate 0
(new instructions that only have EVEX forms, EVEX.512.66.0F3A.W0 3F
or 3E
).
In asm source, assemblers let you use vpcmpleb k, zmm, zmm
for example as a more meaningful way to write vpcmpb k, z, z, 2
, as recommended in Table 5-17 in Intel's vol.2 manual. i.e. with the predicate as part of the mnemonic, implying the immediate.
That table includes a line for VPCMPEQ* reg1, reg2, reg3
-> VPCMP* reg1, reg2, reg3, 0
, but the shorter no-immediate form takes precedence for vpcmpeqb k, zmm, zmm
in actual assemblers.
NASM source mixed with objdump -S -drwC -Mintel
disassembly. (Same results assembling with gas .intel_syntax noprefix
):
vpcmpeqb k1, zmm0, zmm1
0: 62 f1 7d 48 74 c9 vpcmpeqb k1,zmm0,zmm1 # 74 opcode
vpcmpb k1, zmm0, zmm1, 0
6: 62 f3 7d 48 3f c9 00 vpcmpeqb k1,zmm0,zmm1 # 3f opcode
vpcmpequb k1, zmm0, zmm1
d: 62 f3 7d 48 3e c9 00 vpcmpequb k1,zmm0,zmm1 # 3e opcode
vpcmpub k1, zmm0, zmm1, 0
14: 62 f3 7d 48 3e c9 00 vpcmpequb k1,zmm0,zmm1 # 3e opcode
Interestingly, NASM/GAS will assemble vpcmpb k1, zmm0, zmm1, 0
as written, to the form with the immediate. But objdump
will disassemble that back into vpcmpeqb k1,zmm0,zmm1
, same as the no-immediate opcode, so this is one of the cases where a disassemble/reassemble round trip would change the machine code. (But not the architectural effect of the instruction, of course)
NASM / GAS don't optimize vpcmpequb
into vpcmpeqb
for you, so always avoid the unsigned version when comparing for integer equality.
If you're writing in asm, look at the asm reference manual (HTML extract https://www.felixcloutier.com/x86/vpcmpb:vpcmpub or Intel's original PDFs that's scraped from), not the Intrinsics guide. Especially when you run into any mystery or disagreement between what something says and what tools and/or CPUs seem to be doing!
The intrinsics guide is certainly known to have errors (although they do get fixed as people report them on Intel's forums). Especially likely to see errors in the parts that aren't important for correctness of using C/C++ intrinsics.
It's not impossible for Intel's asm manuals to have errors, too, but not anything as major as leaving out an entire machine opcode form of an instruction for an already-released instruction set.
In no way is vpcmpb k, zmm, zmm
ever valid without an explicit immediate, in real asm source or as a descriptions of machine code, so yes this is definitely an error in the intrinsics guide.
The vpcmpeqb %zmm, %zmm, %k
asm syntax with reversed operand-list and $immediate
is "AT&T syntax". It happens to be the one GAS uses by default for .s
/ .S
files, but you can use .intel_syntax noprefix
.
It normally doesn't make sense to use inline asm for single instructions - compilers normally do a good enough job with intrinsics, although perhaps not always for AVX-512 mask stuff.