Search code examples
assemblyx86x86-64

Do source and destination operands need to be the same size?


I just had a go at this problem which asks you to explain what is wrong with the line of code:

movl %eax, %rdx

The solution says the destination operand is the wrong size.

Is it only "illegal" if going from a larger size to a smaller size, or is it the case that source and destination operands must be the same size for all instructions (or at least mov class types)?


Solution

  • Yes, operands have to be the same size except for a few special instructions like shl %cl, %eax or movzwl %ax, %edx.

    CPUs execute machine code, not assembly. In machine code, there's the opcode and prefixes (along with the default provided by being in 64-bit mode) to specify the operand-size. Not separate size attributes for each operand; that would be a waste of bits.

    Assembly language is a text format for describing / specifying machine instructions.

    The ISA designer (Intel, then AMD for 64-bit mode) chose to define the semantics of partial registers in the instruction set in terms of narrow operand sizes. With the effect on the full register being merging when you write AL, AH, or AX (defined by Intel in 8086 and 808386, with AMD64 matching that), or implicit zero-extending into RAX when you write EAX (new semantics in AMD64 for the part of the register that was new).

    See Why do x86-64 instructions on 32-bit registers zero the upper part of the full 64-bit register? and Intel's instruction-set reference for instructions like mov vs. movsx (and movsxd which as new with AMD64).

    In the asm text syntax, there's no machine code that corresponds to movl %eax, %rdx.
    An assembler correctly tells you that's meaningless.

    That's also why AT&T syntax attaches an operand-size suffix (b/w/l/q) to the mnemonic, not to each operand separately. The only ambiguity is instructions with no register operands, just immediate and memory like andl $1, (%rdi) vs. andb $1, (%rdi) or notq (%rdi, %rsi, 8).

    There is an instructions movsld %eax, %rdx to sign-extend from 32 to 64-bit. (There is no movzx for 32 to 64; that's implicit in mov %eax, %edx: MOVZX missing 32 bit register to 64 bit register, only 8 or 16-bit source operands like movzbl %al, %edx.)

    There are other special instructions with different sized operands, for example shifts like shl %cl, %edx.

    This syntax design works well for loads / stores like mov %eax, (%rdi) or add (%rdi), %esi, where the register operand implies the memory operand size. If mov could have two separate sizes, you'd always need to indicate the memory operand-size, like you do as part of the mnemonic for movzx/movsx. (e.g. AT&T syntax movzbl (%rdi), %eax to zero-extend a byte from memory into RAX implicitly, by explicitly writing EAX.)


    Other designs for a text syntax to describe machine code would be possible, e.g. you could invent a syntax where movl %eax, %rdx is just making the zero-extension explicit, and perhaps movl %eax, %edx wouldn't be allowed because there's no way to write a 32-bit register without implicitly zero-extending to 64-bit. And then you could define movl (%rdi), %rdx as being a 32-bit load (implied by the l suffix) zero-extending into the 64-bit RAX. i.e. what we currently define in AT&T syntax as movl (%rdi), %edx.

    I think that hypothetical design would be less intuitive than just saying most instructions require all their operands to be the same width. And in practice it's not how AT&T syntax is designed. Instead, AT&T went with the same conventions as Intel / AMD use in their manuals, just with operand order reversed. I'm not aware of any syntax for any ISA that works this way; when writing a narrow register implicitly zero-extends, that's left implicit in the asm text syntax (e.g. in AArch64, and all the various x86-64 syntaxes; Intel, AT&T, Plan9/Go)


    References: