Search code examples
assemblyx86cpu-architectureinstruction-setinstruction-encoding

How does x86 handle byte vs word addressing when executing instructions and reading/writing data?


So I am learning how x86 works and have come across people saying that it is byte-addressable, yet can read words, double words, etc. How does the processor decide which method to use and when? E.g. for accessing the next instruction and when a user wants to read/write to memory, which addressing mode is used?


Solution

  • Every memory access has an operand-size specified by the machine-code instruction. (Addressing mode isn't the right term: different addressing modes are different ways of specifying the lowest address of the chunk of memory to be accessed, like [rdi] vs. [rdi + rdx*8] vs. [RIP + rel32])

    Encoding different operand-sizes is done with prefixes (for 16 vs. 32 vs. 64-bit for integer instructions) or a different opcode for the same mnemonic (8-bit integer). Or with bits in the VEX or EVEX prefix for AVX / AVX512 instructions that can use xmm, ymm, or zmm registers.

    Decoding also depends on the current mode implying the default operand-size: 32 for 32 and 64-bit mode, or 16 for 16-bit mode. A 66 opererand-size prefix implies the opposite size.

    In 64-bit mode, the .W (width) bit in the REX prefix sets the operand-size to 64-bit. (And some instructions like push/pop default to 64-bit operand-size with no prefix needed, but most instructions like add/sub/mov still default to 32-bit)

    There's also a 0x67 address-size prefix which swaps addressing modes to the other size. (16 vs. 32 or in 64-bit mode 64 -> 32.)


    For example, mov [rdi], eax is a dword store, and the machine-code encoding will specify that by using no special prefixes on the opcode for 16/32/64-bit operand-size. (see https://www.felixcloutier.com/x86/mov for the available encodings. But note that Intel's manual doesn't mention 66 operand-size prefixes in each entry: it has 2 identical encodings with different sizes. You have to know which one needs a 66 prefix based on the current mode's default.)

    16-bit operand-size like mov [rdi], ax will have the same machine code by with a 66 operand-size prefix.

    8-bit operand-size (mov [rdi], al) has its own opcode, no prefixes needed.

    movzx / movsx are interesting cases: the memory access size is different from the destination register. The memory-access size (byte or word) is specified by the opcode. Operand-size prefixes only affect the destination size. Except x86-64 63 /r movsxd (dword->qword sign-extension) where a 66 operand-size prefix does shrink the memory-access size down to m16 to match the destination.

    Similarly for SIMD instructions; the instruction encoding uniquely determines the memory-access size, along with the registers read or written.