assembly x86-16 disassembly machine-code instruction-encoding

8086 memory to accumulator encoding: why do mov al, [absolute] and mov ah, [absolute] have different sizes?

mov al, [10] ; a0 0a 00
mov ah, [10] ; 8a 26 0a 00

After assembling the above 8086 assembly code using NASM, I noticed a length disparity in the resulting machine code (shown in the comments above the first is 3 bytes and the other is 4 bytes). Why is there this disparity when the two instructions are so similar?

enter image description here

From the 8086 manual it seems that the first instruction is assembled using the memory to accumulator encoding and the second instruction is encoded using the memory to register encoding.

I'm new to assembly so I'm sure I'm missing something, but why does the second instruction need to be encoded with mem to reg rather than mem to accumulator? Isn't AH still the accumulator register?

Solution

AH is not the 8-bit version of "the accumulator", that's AL.

The special-case encoding without a ModRM byte needs the opcode to imply the register operand. There's only one such opcode for each combo of direction and operand-size (load/store, 8-bit and 16-bit), so there are no bits left to encode AH vs. AL.

The a0 opcode implies the destination register is AL. There is no opcode that implies AH and takes a 16-bit absolute address, so indeed the only option is the opcode for 8-bit operand-size mov r, r/m.

The same goes for all the 8-bit op accumulator, immediate operations like test al, imm8 and sub al, imm8 being 2 bytes, with special encodings only for AL.
(See my codegolf x86 machine-code tips answer for some about the history. I think others have written about some possible design reasons, too, like for efficient mechanical translation of 8080 asm source, since 8080 was an accumulator machine where most ALU instructions could only do things to the accumulator, 8-bit A. Even 8086 has some instructions that only work on the accumulator, including sign-extension, hence spending 8 opcode bytes on xchg reg with accumulator (AX))

Intel's current manual still applies to 16-bit mode, and is searchable text, not just a tilted picture :P https://www.felixcloutier.com/x86/mov is an HTML scrape of the PDF for the MOV entry in vol.2 of the SDM.
Confusingly, moffs8 is 8-bit operand-size with an absolute address; the address-size is the default one for the mode, not 8-bit. So the relevant entry is A0 MOV AL, moffs8

On 386 and later, a 66 byte is an operand-size prefix and can be used with 16-bit operand-size opcodes to make the operand-size 32-bit. Or the reverse if the default for the mode is 32-bit. And a 67 byte makes the address-size the opposite of the default for the mode, e.g. 32-bit address if you were in 16-bit mode, or 16 if you were in 32-bit mode. Anyway, main point being that you'll see MOV AX, moffs16 and MOV EAX, moffs32 with the same A1 opcode, and depending on what mode (bits 16 vs. bits 32 or bits 64) you're assembling for, the assembler will use a 66h prefix byte or not to get the desired operand-size.

Knowing that, Intel's current manual should be readable, and it uses specific register names, not just "accumulator", because it has separate entries with the exact opcode byte, instead of leaving a wildcard w bit in the table. This makes it more bloated to read and less helpful for seeing the patterns (like width and direction normally being the low 2 bits), but does remove any ambiguity over what's meant by a term like "accumulator".