I am reading the ia-32
instruction format and found that ModR/M
is one byte if required, but how to determine if it is required, someone says it is determined by Opcode
, but how? I want to know the details, and is there some useful and authoritative documents which explain the details?
Intel's vol.2 manual has details on the encoding of operands for each form of each instruction. E.g. taking just the 8-bit operand size versions of the well-known add
instruction, which has 2 reg,rm forms ; a rm,immediate form ; and a no-ModRM 2-byte short form for add al, imm8
Opcode Instruction | Op/En | 64-bit Mode | Compat/Leg Mode | Description
04 ib ADD AL, imm8 | I | Valid Valid Add imm8 to AL.
80 /0 ib ADD r/m8, imm8 | MI | Valid Valid Add imm8 to r/m8.
00 /r ADD r/m8, r8 | MR | Valid Valid Add r8 to r/m8.
02 /r ADD r8, r/m8 | RM | Valid Valid Add r/m8 to r8.
And below that, the Instruction Operand Encoding ¶ table details what those I / MI / MR / RM codes from the Op/En (operand encoding) column above mean:
Op/En | Operand 1 | Operand 2 | Operand 3 Operand 4
RM | ModRM:reg (r, w) | ModRM:r/m (r) | NA NA
MR | ModRM:r/m (r, w) | ModRM:reg (r) | NA NA
MI | ModRM:r/m (r, w) | imm8/16/32 | NA NA
I | AL/AX/EAX/RAX | imm8/16/32 | NA NA
Notice that the "I" operand form doesn't mention a ModRM, so there isn't one. But MI does have one. (With the /r
field being filled in with the /0
from the 80 /0
in the opcode table: full explanation with 83 /0 add r/m64, imm8
as an example.)
Notice that RM and MR differ only in whether the r/m operand (that can be memory) is the destination or source.
Most x86 ALU instructions have four reg, r/m opcodes, one for each direction (MR vs. RM) for each of 8-bit and non-8-bit. The non-8-bit form has a size determined by 66
operand-size prefix to flip between 16-bit and 32-bit, or REX.W for 64-bit, or none for the default operand-size (which is 32-bit except in 16-bit modes).
Plus the standard immediate form(s):
This is a lot of opcodes for every mnemonic, and is why 8086 didn't have room for more following the same pattern as the usual instructions. (Why are there no NAND, NOR and XNOR instructions in X86?)
See also https://wiki.osdev.org/X86-64_Instruction_Encoding which covers things more concisely than Intel's manual. Also note that you can check your understanding by assembling something with an assembler like NASM or GAS and looking at the machine code. Or just looking at disassembly of an existing program like objdump -drwC -Mintel /bin/ls | less
Some disassemblers even group bytes together in the machine code for each instruction, keeping a 4-byte immediate together as a group separate from opcode and modrm for example. (Agner Fog's objconv
is like this.)