assembly x86 nasm machine-code instruction-encoding

NASM produces unexpected extra operand size prefix

I am having a problem with NASM running on Linux amd64.

Running nasm on a single instruction mov eax, 0x12345678 produces 6 bytes:

00000000 <.data>:
   0:   66 b8 78 56             mov    $0x5678,%ax
   4:   34 12                   xor    $0x12,%al

However, using as on the same instruction in AT&T format mov $0x12345678, %eax produces the correct result with 5 bytes:

0000000000000000 <.text>:
   0:   b8 78 56 34 12          mov    $0x12345678,%eax

What am I missing with NASM?

Solution

NASM thinks you're assembling 16-bit code, sees a 32-bit instruction, and slaps a size override prefix on it. Then the disassembler thinks you're disassembling 32-bit code, gets confused by the size override prefix, and assumes it's 2 instructions.

GAS thinks you're assembling 32-bit code, so it doesn't generate a size override prefix.

You can fix NASM by figuring out why it thinks you're assembling 16-bit code. It's the default for the "flat binary" output file format, and if that's the case you can fix it by putting a bits 32 directive somewhere before the 32-bit instruction/s.

Alternatively, maybe you are trying to assemble 16-bit code and GAS is wrong. In that case maybe you need a .code16 directive somewhere before the 16-bit code.

Note that the size override prefix works like a toggle - if the default size is 16 bits then the prefix makes the instruction 32 bit; and if the default size is 32 bits then the same prefix makes the instruction 16 bit.