A few x86 instructions like ROUNDSS require this seemingly obscure instruction operand encoding, on which I can't find any documentation or definition in Intel's Software Developer's Manual.
How are the bits of this encoding used? I put 66 0f 3a 0b c0 0c
(roundsd xmm0,xmm0,0xc
) into a dissembler and varied the bits to gain a better understanding, but could only access half the XMM registers.
I'm also unclear on the meaning of
128-bit Legacy SSE version: The first source operand and the destination operand are the same.
as e. g. 66 0f 3a 0b c1 0c
is disassembled without warning/error to roundsd xmm0,xmm1,0xc
.
The encoding is as follows:
66 0F 3A 0A /r ib
The opcode is 0A
in the 0F 3A
opcode plane. A mandatory 66
prefix must be supplied. The opcode is followed by a modr/m byte (/r
) encoding the first operand in the R field and the second operand in the R/M field. The instruction is followed by an 8 bit immediate (ib
) encoding the third operand.
Let's encode for example
roundss xmm8, [rdx+r9*8+64], 0xc
We have xmm8
and r9
as “upper” registers, so a REX.RX prefix 46
must be supplied to hold the extra bits.
The modr/m byte is 44
indicating an 8 bit displacement (mod = 01), presence of a SIB byte (r/m = 100), and xmm8
as a reg operand (reg = 000, REX.R set).
The SIB byte is CA
indicating rdx
as the base (base = 010), r9
as the index (index = 001, REX.X set) and a scale of 8 (scale = 11).
Then follows the displacement 40
(i.e. decimal 64).
Finally, we have the 8 bit immediate 0c
.
These bits are then assembled in the order legacy prefixes, REX prefix, opcode plane prefix, opcode, modr/m byte, SIB byte, displacement, immediate. So the entire instruction comes out as
66 46 0F 3A 0A 44 CA 40 0C
| | | | | | | \... immediate
| | | | | | \...... displacement
| | | | | \......... SIB byte
| | | | \............ modr/m byte
| | | \............... opcode
| | \..................... opcode plane prefix
| \........................ REX prefix
\........................... mandatory prefix
128-bit Legacy SSE version: The first source operand and the destination operand are the same.
The VEX encoded variant of the instruction vroundss
has an additional source operand. Instead of this operand, the legacy encoded version reads this source operand from the destination operand.
If we want to encode this instruction as the VEX-encoded variant
vroundss xmm8, xmm2, [rdx+r9*8+64], 0xc
instead, we start with a VEX prefix. This prefix subsumes mandatory prefix, REX prefix, and opcode plane prefix into one 3 byte prefix. This prefix has the form:
11000100 RXBmmmmm WvvvvLpp
R, X, B: complemented REX prefix bits
W: REX.W prefix bit (not complemented)
m: opcode plane (1: `0F`, 2: `0F 38`, 3: `0F 3A`)
L: vector length (0: 128 bit, 1: 256 bit)
p: mandatory prefix (0: none, 1: `66`, 2: `F3`, 3: `F2`)
v: complemented extra source register number
A shorter 2 byte VEX prefix
11000101 RvvvvLpp
can be used when REX.X, REX.B, and REX.W are clear and m = 00000. This is not the case here. The encoding is given as
VEX.LIG.66.0F3A.WIG 0A /r ib VROUNDSS xmm1, xmm2, xmm3/m32, imm8
indicating that the L and W fields are ignored, there is a mandatory 66
prefix and the opcode is 0A
in the 0F 3A
opcode plane, followed by modr/m operands and a byte immediate. The first and third operands are encoded in the modr/m byte, the second operand is the additional operand encoded by the VEX prefix.
So we have for our instruction
R = 0, indicating presence of REX.R
X = 0, indicating presence of REX.X
B = 1, indicating absence of REX.B
W = 0, indicating absence of REX.W (ignored)
L = 0, indicating a 128 bit operand size (ignored)
m = 00010, indicating the 0F 3A opcode plane
p = 01, indicating a 66 mandatory prefix
v = 1101, indicating xmm2 as a first operand
Giving the VEX prefix C4 22 69
. The rest is identical to the legacy encoding, giving a full instruction
C4 23 69 0A 44 CA 40 0C
| | | | | \... immediate
| | | | \...... displacement
| | | \......... SIB byte
| | \............ modr/m byte
| \............... opcode
\........................ VEX prefix