Let's take the following assembly instruction:
add %cl,%bl
This gets encoded as: 00
cb
, or 00000000
11001011
in binary. Putting the cb
into the ModR/M bitfields, it looks like:
1 1 0 0 1 0 1 1
+---+---+---+---+---+---+---+---+
| mod | reg | r/m |
+---+---+---+---+---+---+---+---+
And, inn looking up the register field here we get:
11
(Register addressing mode)001
(cl register)011
(bl register)And, I believe 000000ds
is the add
instruction, and d=s=0
since they're all registers. Is that a correct inderstanding of how this instruction is encoded? Additionally, for the 'full encoding' scheme, would the following be accurate (in bytes not bits):
[empty] 0x0 0b11001011 [empty] [empty] [empty]
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
Prefix Instruction Mod-reg-r/m Scale displacement immediate
Are there any things I'm missing here in my attempt at 'decoding' the instruction?
Yes, looks right.
The general pattern (for "legacy" ALU instructions that date back to 8086) for encoding op r/m, r
vs. op r, r/m
, and 8-bit vs. 16/32 bit does use the low 2 bits of the opcode byte in a regular pattern, but there's no need to rely on that.
Intel does fully document exactly what's going on for each encoding of each instruction in their vol.2 manual. See the Op/En column and Operand Encoding table for add
for example. (See also https://ref.x86asm.net/coder64.htm which also specifies which operand is which for every opcode). These both let you know which opcodes take a ModRM byte and which don't.
These of course use Intel-syntax order. You're making your life more complicated by trying to follow manuals and tutorials while using AT&T syntax which reverses the order of the operand-list vs. Intel and AMD manuals.
e.g. 00 /r
is listed as MR
operand encoding, which from the table we can see is operand 1 = ModRM:r/m (r, w)
, so it's read and written, and encoded by the r/m
field. operand 2 = ModRM:reg (r)
, so it's a read-only source encoded by the reg
field.
Fun fact: 00 00
is add [rax], al
, or AT&T add %al, (%rax)
Note that you can ask GAS to pick the either encoding: x86 XOR opcode differences
{load} add %cl,%bl # 02 d9
{store} add %cl,%bl # 00 cb