Unable to compile assembly code with xmmword operand-size using nasm

I was trying to compile an assembly code using nasm (nasm -o file input.asm) and threw an error at line 2 in the following code snippet:

mov rsi, 0x400200
movdqu xmm0,xmmword [rsi]
nop

I am not sure if instructions with 128 bit registers can be compiled using nasm but is there any other way to compile using nasm in such cases where 128 bit registers are involved?

Solution

You don't need to specify an operand size for the memory operand,
just use movdqu xmm0, [rsi] and let xmm0 imply 128-bit operand-size.
NASM supports SSE/AVX/AVX-512 instructions.

If you did want to specify an operand-size, the name for 128-bit is oword, according to ndisasm if you assemble that instruction and then disassemble the resulting machine code. oword = oct-word = 8x 2-byte words = 16 bytes.

Note that GNU .intel_syntax noprefix (as used by objdump -drwC -Mintel) will use xmmword ptr, unlike NASM.

If you really want to use xmmword, %define xmmword oword at the top of your file.

The operand-size is always implied by the mnemonic and / or other register operands for all SSE/AVX/AVX-512 instructions; I can't think of any instructions where you need to specify qword vs. oword vs. yword or anything, the way you do with movsx eax, byte [rdi] vs. word [rdi]. Often it's the same size as the register, but there are exceptions with some shuffle / insert / extract instructions. For example:

SSE2 pinsrw xmm0, [rdi], 3 loads a word and merges it into bytes 6 and 7 of xmm0.
SSE2 movq [rdi], xmm0 stores the qword low half
SSE1 movhps [rdi], xmm0 stores the high qword
AVX1 vextractf128 [rdi], ymm0, 1 does a 128-bit store of the high half
AVX2 vpmovzxbw ymm0, [rdi] does packed byte->word zero extension from a 128-bit memory source operand
AVX-512F vpmovdb [rdi]{k1}, zmm2 narrows dword to byte elements (with truncation; other versions do saturation) and does a 128-bit store, with masking at byte granularity. (One of the only ways to do byte-granularity masking without AVX-512BW, other than legacy-SSE maskmovdqu which has cache-evicting NT semantics. So I guess that makes it especially interesting for Xeon Phi KNL.)

You could specify oword on any of those to make sure the size of the memory access is what you think it is. (i.e. to have NASM check it for you.)