So I am learning how x86 works and have come across people saying that it is byte-addressable, yet can read words, double words, etc. How does the processor decide which method to use and when? E.g. for accessing the next instruction and when a user wants to read/write to memory, which addressing mode is used?
Every memory access has an operand-size specified by the machine-code instruction. (Addressing mode isn't the right term: different addressing modes are different ways of specifying the lowest address of the chunk of memory to be accessed, like [rdi]
vs. [rdi + rdx*8]
vs. [RIP + rel32]
)
Encoding different operand-sizes is done with prefixes (for 16 vs. 32 vs. 64-bit for integer instructions) or a different opcode for the same mnemonic (8-bit integer). Or with bits in the VEX or EVEX prefix for AVX / AVX512 instructions that can use xmm, ymm, or zmm registers.
Decoding also depends on the current mode implying the default operand-size: 32 for 32 and 64-bit mode, or 16 for 16-bit mode. A 66
opererand-size prefix implies the opposite size.
In 64-bit mode, the .W
(width) bit in the REX prefix sets the operand-size to 64-bit. (And some instructions like push
/pop
default to 64-bit operand-size with no prefix needed, but most instructions like add
/sub
/mov
still default to 32-bit)
There's also a 0x67
address-size prefix which swaps addressing modes to the other size. (16 vs. 32 or in 64-bit mode 64 -> 32.)
For example, mov [rdi], eax
is a dword store, and the machine-code encoding will specify that by using no special prefixes on the opcode for 16/32/64-bit operand-size. (see https://www.felixcloutier.com/x86/mov for the available encodings. But note that Intel's manual doesn't mention 66
operand-size prefixes in each entry: it has 2 identical encodings with different sizes. You have to know which one needs a 66
prefix based on the current mode's default.)
16-bit operand-size like mov [rdi], ax
will have the same machine code by with a 66
operand-size prefix.
8-bit operand-size (mov [rdi], al
) has its own opcode, no prefixes needed.
movzx
/ movsx
are interesting cases: the memory access size is different from the destination register. The memory-access size (byte or word) is specified by the opcode. Operand-size prefixes only affect the destination size. Except x86-64 63 /r
movsxd
(dword->qword sign-extension) where a 66
operand-size prefix does shrink the memory-access size down to m16
to match the destination.
Similarly for SIMD instructions; the instruction encoding uniquely determines the memory-access size, along with the registers read or written.