Search code examples
x86cpu-architecturecpu-registers

Core physical registers usage for XMM registers


In X86, What type of physical internal registers a CPU uses for XMM type registers. Would that be integer or vector physical registers?

I think vector registers are used because XMM registers are 128-bit registers. Any confirmation is appreciated.


Solution

  • XMM registers are vector registers. They're renamed onto the FP/SIMD register file, not (general-purpose) integer, regardless of whether you're using SIMD-integer or SIMD-fp instructions.

    https://blog.stuffedcow.net/2013/05/measuring-rob-capacity/ shows how to approximately measure the capacities of the physical register files for integer vs. SIMD, since those can be a smaller limit than ReOrder Buffer size for hiding cache-miss latency.

    Intel since Sandybridge and AMD since even longer ago have renamed registers onto physical register files, with separate ones for general-purpose integer vs. SIMD/FP.

    https://www.realworldtech.com/sandy-bridge/5/ shows that Sandybridge's SIMD PRF has has 144 entries, vs. 160 entries in the general-purpose integer PRF. (vs. P6 family, Nehalem and earlier, not using a separate PRF, but keeping register values directly in the ROB). vs. Skylake with 180 entries in the integer PRF vs. 168 in the SIMD PRF https://en.wikichip.org/wiki/intel/microarchitectures/skylake_(client)#Scheduler

    Skylake splits further, with a separate register file for renaming 80-bit x87/MMX and AVX-512 mask registers (k0..7), separate from the 512-bit entries in the vector register file. https://travisdowns.github.io/blog/2020/05/26/kreg2.html

    Also related:


    For more about x86 CPU internals, see Agner Fog's microarch guide on https://agner.org/optimize/ and other links in https://stackoverflow.com/tags/x86/info

    Also for good measure, Modern Microprocessors A 90-Minute Guide! is a good read, covering a lot of good general stuff about design considerations in modern CPUs.


    For example, ADDPD XMM1, XMM2. I'll reiterate the question as will this instruction be scheduled on vector units or regular INT based units?

    The uop for that instruction will run on a SIMD-FP execution unit, after the CPU reads its inputs from the appropriate register file or forwards one or both from a previous instruction.

    On Intel CPUs, execution ports have both SIMD and integer execution units, so it can compete with add eax, ecx throughput. See https://www.realworldtech.com/haswell-cpu/4/ for Haswell vs. Sandybridge execution unit distribution. (Alder Lake added yet another execution port with just integer. See https://uops.info/ and Agner Fog's guides.)

    On AMD CPUs, there are a separate group of SIMD/FP execution ports, independent from the integer execution ports. See a Zen 2 diagram for example: https://en.wikichip.org/wiki/amd/microarchitectures/zen_2#Block_Diagram So if a bunch of instructions are waiting for inputs that finally become ready, a Zen core can begin executing 4 integer and 4 FP/SIMD uops in the same cycle. Also some loads+stores. (The front-end is "only" 5 instructions or 6 uops wide, so it can't sustain that.)