Search code examples
assemblyx86-64ssemachine-codeinstruction-encoding

How does the RMI Instruction Operand Encoding of ROUNDSS work?


A few x86 instructions like ROUNDSS require this seemingly obscure instruction operand encoding, on which I can't find any documentation or definition in Intel's Software Developer's Manual.

How are the bits of this encoding used? I put 66 0f 3a 0b c0 0c (roundsd xmm0,xmm0,0xc ) into a dissembler and varied the bits to gain a better understanding, but could only access half the XMM registers.

I'm also unclear on the meaning of

128-bit Legacy SSE version: The first source operand and the destination operand are the same.

as e. g. 66 0f 3a 0b c1 0c is disassembled without warning/error to roundsd xmm0,xmm1,0xc.


Solution

  • Legacy Encoding

    The encoding is as follows:

    66 0F 3A 0A /r ib
    

    The opcode is 0A in the 0F 3A opcode plane. A mandatory 66 prefix must be supplied. The opcode is followed by a modr/m byte (/r) encoding the first operand in the R field and the second operand in the R/M field. The instruction is followed by an 8 bit immediate (ib) encoding the third operand.

    Let's encode for example

    roundss xmm8, [rdx+r9*8+64], 0xc
    

    We have xmm8 and r9 as “upper” registers, so a REX.RX prefix 46 must be supplied to hold the extra bits.

    The modr/m byte is 44 indicating an 8 bit displacement (mod = 01), presence of a SIB byte (r/m = 100), and xmm8 as a reg operand (reg = 000, REX.R set).

    The SIB byte is CA indicating rdx as the base (base = 010), r9 as the index (index = 001, REX.X set) and a scale of 8 (scale = 11).

    Then follows the displacement 40 (i.e. decimal 64).

    Finally, we have the 8 bit immediate 0c.

    These bits are then assembled in the order legacy prefixes, REX prefix, opcode plane prefix, opcode, modr/m byte, SIB byte, displacement, immediate. So the entire instruction comes out as

    66 46 0F 3A 0A 44 CA 40 0C
    |  |  |     |  |  |  |  \... immediate
    |  |  |     |  |  |  \...... displacement
    |  |  |     |  |  \......... SIB byte
    |  |  |     |  \............ modr/m byte
    |  |  |     \............... opcode
    |  |  \..................... opcode plane prefix
    |  \........................ REX prefix
    \........................... mandatory prefix
    

    VEX Encoding

    128-bit Legacy SSE version: The first source operand and the destination operand are the same.

    The VEX encoded variant of the instruction vroundss has an additional source operand. Instead of this operand, the legacy encoded version reads this source operand from the destination operand.

    If we want to encode this instruction as the VEX-encoded variant

    vroundss xmm8, xmm2, [rdx+r9*8+64], 0xc
    

    instead, we start with a VEX prefix. This prefix subsumes mandatory prefix, REX prefix, and opcode plane prefix into one 3 byte prefix. This prefix has the form:

    11000100 RXBmmmmm WvvvvLpp
    R, X, B: complemented REX prefix bits
    W: REX.W prefix bit (not complemented)
    m: opcode plane (1: `0F`, 2: `0F 38`, 3: `0F 3A`)
    L: vector length (0: 128 bit, 1: 256 bit)
    p: mandatory prefix (0: none, 1: `66`, 2: `F3`, 3: `F2`)
    v: complemented extra source register number
    

    A shorter 2 byte VEX prefix

    11000101 RvvvvLpp
    

    can be used when REX.X, REX.B, and REX.W are clear and m = 00000. This is not the case here. The encoding is given as

    VEX.LIG.66.0F3A.WIG 0A /r ib VROUNDSS xmm1, xmm2, xmm3/m32, imm8
    

    indicating that the L and W fields are ignored, there is a mandatory 66 prefix and the opcode is 0A in the 0F 3A opcode plane, followed by modr/m operands and a byte immediate. The first and third operands are encoded in the modr/m byte, the second operand is the additional operand encoded by the VEX prefix.

    So we have for our instruction

    R = 0, indicating presence of REX.R
    X = 0, indicating presence of REX.X
    B = 1, indicating absence  of REX.B
    W = 0, indicating absence  of REX.W (ignored)
    L = 0, indicating a 128 bit operand size (ignored)
    m = 00010, indicating the 0F 3A opcode plane
    p = 01, indicating a 66 mandatory prefix
    v = 1101, indicating xmm2 as a first operand
    

    Giving the VEX prefix C4 22 69. The rest is identical to the legacy encoding, giving a full instruction

    C4 23 69 0A 44 CA 40 0C
    |        |  |  |  |  \... immediate
    |        |  |  |  \...... displacement
    |        |  |  \......... SIB byte
    |        |  \............ modr/m byte
    |        \............... opcode
    \........................ VEX prefix