assembly x86-64 intel machine-code instruction-encoding

Where has Intel documented the encoding of extended registers (R8-R15) in ModR/M byte?

I'm using the latest official "Intel® 64 and IA-32 Architectures Software Developer’s Manual Combined Volumes: 1, 2A, 2B, 2C, 2D, 3A, 3B, 3C, 3D and 4" as a reference to understand the machine level encoding of x86-64 ISA.

The documentation for the ModR/M and SIB bytes provided in Volume 2, Section 2.1.3 gives the exact encodings for referring to the 8-bit, 16-bit and 32-bit registers (Tables 2-1, 2-2 and 2-3)

However, I'm not able to find a similar table that specifies how the REX.X, REX.B, REX.R bytes in the REX prefix combine with ModR/M to specify the extended registers. I'm specifically looking for the explicit binary encoding for each of the extended registers. As far as I can tell, the documentation for REX prefix in the manual only specifies that the reg, r/m fields are extended by 1 bit in the MSB using corresponding bits in REX, but doesn't actually give the explicit mapping for the bit combinations.

Does the Intel documentation explicitly state these mappings anywhere in the SDM? Or is it just assumed that R8-R15 will follow the obvious/natural mapping strategy with REX.B/X/R set to 1 and R8 encoded as 000, R9 as 001 ... R15 as 111 ?

Solution

Yes, as you found it follows straightforward binary numbering using the REX bit as the leading bit and the ModRM bits as the low 3 bits to encode the register number. 8 = 1000 in binary, so that's the encoding for R8.

https://wiki.osdev.org/X86-64_Instruction_Encoding#Encoding explains nicely, and https://wiki.osdev.org/X86-64_Instruction_Encoding#Registers even has a table.

I searched Intel's vol.2 PDF for r14 (which is not "special" for anything, and will probably only show up in tables). There are some tables in vol.2, but not one for simple ModRM itself. (The combined PDF is too huge to want to work with).

Vol.2 does clearly describe how REX fields combine with ModRM fields to make 4-bit register numbers. (e.g. Figure 2-4 showing the concatenation of REX.B and ModRM.rm, and REX.R with ModRM.r). I didn't check vol.1 - I wouldn't be surprised if some statement about register names (used by assemblers) matching binary register numbers could be found there. Names are only meaningful to assemblers, not in machine code, and that is clearly documented in vol.2.

However, the info is there in vol.2:

It does have Table 3-1. Register Codes Associated With +rb, +rw, +rd, +ro (Contd.) for instructions with no ModRM where the low 3 bits of the opcode byte are the low 3 bits of the register number. (Like the short encodings of push/pop r64).

Reg REX.B Reg field

R13B Yes 5

R14B Yes 6

Reg	REX.B	Reg field
R13B	Yes	5
R14B	Yes	6

And so on with rows for every register, and 3 more sets of columns for word, dword, and qword sizes for R14W, R14D, R14. So if you were in doubt about the fact that the binary numbers map to register names, that table makes it clear. (It would be insane to assume that register numbers work differently here than in other contexts.)

There's also the opcode map for one-byte opcodes, where push rSI/r14 share an entry (0x58), same for xchg-with-(e)ax, mov-immediate to byte-reg, pop, mov-immediate to word/dword/qword-reg, and bswap in the 2-byte opcode map. Again, it would be insane for these register numbers to work differently than register numbers in other places.

There is a full table Table 2-8. VEX.vvvv to register name mapping, with xmm/ymm0..15 and RAX/EAX .. R15/R15D. (VEX.vvvv can encode integer registers for BMI instructions like andn, and yes they are only documented for dword or qword, not overrideable to word operand size with a 66 prefix.)

Table 2-13. 32-Bit VSIB Addressing Forms of the SIB Byte is also relevant, showing columns like ESI/R14D. (In 64-bit mode, you normally wouldn't use a 67 address-size prefix with vpgatherdd or whatever, but you can. There isn't a separate table for 64-bit address-size.) The table doesn't explicitly mention how VEX.B selects between the two registers for a given value of bits 2:0, but that should be obviously from other cases.