assembly x86-64 sse machine-code instruction-encoding

How does the RMI Instruction Operand Encoding of ROUNDSS work?

A few x86 instructions like ROUNDSS require this seemingly obscure instruction operand encoding, on which I can't find any documentation or definition in Intel's Software Developer's Manual.

How are the bits of this encoding used? I put 66 0f 3a 0b c0 0c (roundsd xmm0,xmm0,0xc ) into a dissembler and varied the bits to gain a better understanding, but could only access half the XMM registers.

I'm also unclear on the meaning of

128-bit Legacy SSE version: The first source operand and the destination operand are the same.

as e. g. 66 0f 3a 0b c1 0c is disassembled without warning/error to roundsd xmm0,xmm1,0xc.

Solution

Legacy Encoding

The encoding is as follows:

66 0F 3A 0A /r ib

The opcode is 0A in the 0F 3A opcode plane. A mandatory 66 prefix must be supplied. The opcode is followed by a modr/m byte (/r) encoding the first operand in the R field and the second operand in the R/M field. The instruction is followed by an 8 bit immediate (ib) encoding the third operand.

Let's encode for example

roundss xmm8, [rdx+r9*8+64], 0xc

We have xmm8 and r9 as “upper” registers, so a REX.RX prefix 46 must be supplied to hold the extra bits.

The modr/m byte is 44 indicating an 8 bit displacement (mod = 01), presence of a SIB byte (r/m = 100), and xmm8 as a reg operand (reg = 000, REX.R set).

The SIB byte is CA indicating rdx as the base (base = 010), r9 as the index (index = 001, REX.X set) and a scale of 8 (scale = 11).

Then follows the displacement 40 (i.e. decimal 64).

Finally, we have the 8 bit immediate 0c.

These bits are then assembled in the order legacy prefixes, REX prefix, opcode plane prefix, opcode, modr/m byte, SIB byte, displacement, immediate. So the entire instruction comes out as

66 46 0F 3A 0A 44 CA 40 0C
|  |  |     |  |  |  |  \... immediate
|  |  |     |  |  |  \...... displacement
|  |  |     |  |  \......... SIB byte
|  |  |     |  \............ modr/m byte
|  |  |     \............... opcode
|  |  \..................... opcode plane prefix
|  \........................ REX prefix
\........................... mandatory prefix

VEX Encoding

128-bit Legacy SSE version: The first source operand and the destination operand are the same.

The VEX encoded variant of the instruction vroundss has an additional source operand. Instead of this operand, the legacy encoded version reads this source operand from the destination operand.

If we want to encode this instruction as the VEX-encoded variant

vroundss xmm8, xmm2, [rdx+r9*8+64], 0xc

instead, we start with a VEX prefix. This prefix subsumes mandatory prefix, REX prefix, and opcode plane prefix into one 3 byte prefix. This prefix has the form:

11000100 RXBmmmmm WvvvvLpp
R, X, B: complemented REX prefix bits
W: REX.W prefix bit (not complemented)
m: opcode plane (1: `0F`, 2: `0F 38`, 3: `0F 3A`)
L: vector length (0: 128 bit, 1: 256 bit)
p: mandatory prefix (0: none, 1: `66`, 2: `F3`, 3: `F2`)
v: complemented extra source register number

A shorter 2 byte VEX prefix

11000101 RvvvvLpp

can be used when REX.X, REX.B, and REX.W are clear and m = 00000. This is not the case here. The encoding is given as

VEX.LIG.66.0F3A.WIG 0A /r ib VROUNDSS xmm1, xmm2, xmm3/m32, imm8

indicating that the L and W fields are ignored, there is a mandatory 66 prefix and the opcode is 0A in the 0F 3A opcode plane, followed by modr/m operands and a byte immediate. The first and third operands are encoded in the modr/m byte, the second operand is the additional operand encoded by the VEX prefix.

So we have for our instruction

R = 0, indicating presence of REX.R
X = 0, indicating presence of REX.X
B = 1, indicating absence  of REX.B
W = 0, indicating absence  of REX.W (ignored)
L = 0, indicating a 128 bit operand size (ignored)
m = 00010, indicating the 0F 3A opcode plane
p = 01, indicating a 66 mandatory prefix
v = 1101, indicating xmm2 as a first operand

Giving the VEX prefix C4 22 69. The rest is identical to the legacy encoding, giving a full instruction

C4 23 69 0A 44 CA 40 0C
|        |  |  |  |  \... immediate
|        |  |  |  \...... displacement
|        |  |  \......... SIB byte
|        |  \............ modr/m byte
|        \............... opcode
\........................ VEX prefix