Search code examples
assemblyx86machine-codeavx512

Intel AVX-512: how to set the EVEX.z bit


The EVEX.z bit is used in AVX-512 in conjunction with the k registers to control masking. If the z bit is 0, it's merge-masking and if the z bit is 1 the zero elements in the k register are zeroed in the output.

The syntax looks like this:

VPSUBQ zmm0{k2}{z},zmm1,zmm2

where {z} represents the z bit.

But how do you set or test the EVEX.z bit? I've searched every resource I can find but I haven't found an answer.


Solution

  • As I understand it, what they mean is that VPSUBQ zmm0{k2}{z},zmm1,zmm2 and
    VPSUBQ zmm0{k2},zmm1,zmm2 are two different instructions, whose encoding differs in a single bit, called the "z bit". (It's specifically part of the EVEX prefix to the instruction. Wikipedia documents all the fields)

    So you "set the z bit" by specifying {z} in your assembler source, telling the assembler to generate an instruction with the corresponding bit set. This is documented lots of places, like Intel's vol.2 instruction set manual, and somewhat in Intel's intrinsics guide with mask (merge-masking) vs. maskz (zero-masking) versions of most intrinsics)

    It is not a physical bit in the CPU state like the direction flag or something, that would persist from one instruction to the next. It doesn't make sense to "test" it.


    To illustrate, here's what I get by assembling both versions:

    00000000  62F1F5CAFBC2      vpsubq zmm0{k2}{z},zmm1,zmm2
    00000006  62F1F54AFBC2      vpsubq zmm0{k2},zmm1,zmm2
    

    Note the encodings differ in the high bit of the fourth byte. That's your "z bit".


    Maybe you were thinking that you could "set" or "clear" the z bit at runtime, thus changing the masking effect of subsequent instructions? Since it's part of the encoding of each instruction, not the CPU state, that way of thinking only works if you were JITing the instructions on the fly or using self-modifying code.

    In "normal" ahead-of-time code, you'll have to write the code in both versions, once with {z} instructions and once without. Use a conditional jump to decide which version to execute.