As I understand, since AVX, trough the 3-Byte VEX or EVEX prefix, you can encode up to 32 XMM/YMM/ZMM registers in 64-bit mode, but when looking trough the Intel manual past the fact that it tells you that is possible, I cannot find the part where it explains how that actually occurs. There is only one extension field that I can see, which is the REX inverted fields, but not anything else, aside from a special place in the EVEX prefix to encode mask registers.
You would need 2 bits somewhere to encode that many registers. Do you have to combine 2 of the inverted REX fields inside the VEX/EVEX prefixes somehow, or how does this process work?
xmm16..31 (and their ymm/zmm equivalents) are new with AVX-512 and only accessible via EVEX prefixes, which have 2 extra bits to add to each of the ModRM fields, and a 5 more bits as an extra field for the third operand.
REX + legacy-SSE, and VEX for AVX1/2 encodings, can only access xmm/ymm0..15.
Wikipedia's EVEX article has a pretty good table that shows where the bits come from, which I transcribed some of:
Addr mode | Bit 4 | Bit 3 | Bits [2:0] | Register type |
---|---|---|---|---|
REG | EVEX.R' | EVEX.R | ModRM.reg | General Purpose, Vector |
RM | EVEX.X | EVEX.B | ModRM.r/m | GPR, Vector |
NDS/NDD | EVEX.V' | EVEX.v3 | EVEX.v2v1v0 | Vector |
Base | 0 | EVEX.B | SIB.base (or modrm) | GPR |
Index | 0 | EVEX.X | SIB.index | GPR |
If the R/M operand is a vector register instead of a memory addressing mode, it uses both the X (index) and B (base) bits as extra register-number bits. Because that means there's no SIB.index field which could also need extension to select r8..r15.
In REX and VEX prefixes, The X bit goes unused when the source operand isn't memory with an indexed addressing mode. (https://wiki.osdev.org/X86-64_Instruction_Encoding#REX_prefix, but note in a register-number table earlier in that page showing X.Reg, X is just a placeholder for R or B, not REX.X; confusing choice on that page).
See also x86 BSWAP instruction REX doesn't follow Intel specs? for another diagram of using an extra register-number bit from a REX prefix.