ASM x86_64 AVX: xmm and ymm registers differences

What are the differences between xmm and ymm registers? I thought that xmm is for SSE, and ymm is for AVX, but I wrote some code:

vmovups     ymm1, [r9]      
vcvtss2si   rcx, ymm1

and it gives me:

error: invalid combination of opcode and operands

It's about the line:

vcvtss2si   rcx, ymm1

So I wrote:

vcvtss2si   rcx, xmm1

and it works as intended. The first value of ymm1 vector, converted to integer, is now in rcx.

What is it all about? ymm1 and xmm1 are the same registers?

Solution

xmm0 is the low half of ymm0, exactly like eax is the low half of rax.

Writing to xmm0 (with a VEX-coded instruction like vaddps xmm, not legacy SSE addps xmm) zeros the upper lane of ymm0, just like writing to eax zeros the upper half of rax to avoid false dependencies. Lack of zeroing the upper bytes for legacy SSE instructions is why there's a penalty for mixing AVX and legacy SSE instructions.

Most AVX instructions are available with either 128-bit or 256-bit size. e.g. vaddps xmm0, xmm1, xmm2 or vaddps ymm0, ymm1, ymm2. (The 256-bit versions of most integer instructions are only available in AVX2, with AVX only providing the 128-bit version. There are a couple exceptions, like vptest ymm, ymm in AVX1. And vmovdqu if you count that as an "integer" instruction).

Scalar instructions like vmovd, vcvtss2si, and vcvtsi2ss are only available with XMM registers. Reading a YMM register is not logically different from reading an XMM register, but writing the low element (and leaving the other elements unmodified, like the poorly-designed vcvtsi2ss does) would be different for XMM vs. YMM, because the YMM version would leave the upper lane not zeroed.

But scalar with ymm doesn't exist in the machine-code encoding, even for instructions where it would be really useful like vpinsrd / vpextrd (insert / extract a scalar).

Note that even though reading an XMM register and taking only the low scalar element is logically the same as YMM, for the actual implementation it would not be the same. Reading a YMM register implies an AVX-256 instruction, which would have to transition the CPU out of the "saved upper" state (for an Intel CPU with SSE/AVX transitions / states).

In any case, vcvtss2si rax, ymm0 is not encodeable, and the assembler doesn't magically assemble it as vcvtss2si rax, xmm0. If you're writing in asm, you're supposed to know exactly what you're doing. (Although some assemblers will optimize mov rax, 1 to mov eax, 1 for you, so letting you get away with writing ymm as a source register would work. But letting you write ymm as a destination register for vcvtsi2ss would change the meaning, so for consistency it's better that it doesn't do either).