Search code examples
assemblyx86nasmmovzero-extension

MOVZX in Assembly (NASM) - how does it pick a source size when none is specified and the destination is 16-bit?


I am a little bit confused by how movzx behaves in the following example. (Please note that I am assuming the print_int function used in my code sample works and the problem is not there but in my understanding of movzx, since this was recommended by our professor, and it is said to us that it just prints out whatever is in the register as a decimal number):

%include "../linux-ex/asm_io.inc"

extern printf

section .text

global main

main:
push ebp
mov ebp, esp

xor eax, eax

mov ax, [n1]
call print_int

leave
ret

section .data
n1 dw 01234h

This piece of NASM code on a 32-bit architecture prints out 4660 as expected. If I change:

mov ax, [n1]

to

movzx ax, [n1]

I get an output of 52. I know it doesn't make any sense to try to zero-extend to ax since the size of n1 is 16 bits too, but I was surprised that it would yield a different output. It seems that the first 8 bits are cut off and set to zero in the second example, leaving us with the hex number: 34 which is 52 decimal. Why is this happening?


Solution

  • When the assembler sees

    movzx ax, [n1]
    

    it looks for an instruction that fits the mnemonic and operands used here. I.e. an instruction with mnemonic movzx whose first operand is a 16-bit register and whose second operand is a memory operand. The only variant of movzx that fits the bill is

    movzx r16, r/m8
    

    and this is indeed what NASM assembles this code to.

    In contrast to MASM, NASM does not track the type of symbols. While MASM might warn or fail assembly with an error because [n1] has type word but is used as an operand of type byte, NASM does no such thing. Instead, the size of operands must be explicitly specified using a keyword like byte, word, or dword if it is ambiguous (i.e. if there are multiple instructions with the same mnemonic and operands, but at different operand sizes).