assembly operating-system kernel x86-64 thread-local-storage

Where I should use "swapgs" instruction

Hi I'm a kernel learner and have some questions about swapgs.

According to AMD's documentation, it swaps the gs.base hidden register and KernelGSBase MSR.

Furthermore, the addressing with "gs:XXXX" are calculated as "gs.base + base + (scale*index) + displacement"

Now my first question is:

gs.base is the hidden part of segment register
displacement is the "XXXX" part of "gs:XXXX"
index may be the selector index in gs

Then where I should store the "base" and "scale"?

Furthermore, where I should use it, my current project put upper half of the virtual memory space as kernel, and the compiler will not usually add "gs:XXXX" as addressing reference.

So where, in particular, I should use the swapgs instruction.

Solution

There is no hidden "base and scale", only a hidden gs.base that you use with a normal addressing mode. (And a hidden value for the GS register itself. That's a selector value, which would act as an index into the GDT if you actually did mov gs, eax instead of modifying just the GS base via an MSR, or via wrgsbase. But that's not related the the index part of the offset in a full gs:[base + index*scale] addressing mode).

You use swapgs in the kernel's syscall entry point handler, then use GS segment overrides on some loads and stores like you would for thread-local storage, so the previously-hidden gs.base is used with the [base + idx*scale] addressing mode you use in each load or store instruction. e.g. something like mov [gs:0x10], rsp to save the user-space stack pointer and mov rsp, [gs:0x18] to load the kernel stack pointer.

swapgs exists because syscall doesn't change RSP to point at the kernel stack (and doesn't save the user-space RSP anywhere). So you need some kind of thread-local (or actually core-local) storage so each core can get the right kernel stack pointer for the task running on that core. The hidden GS base is storage for that hidden pointer, and a way to use it without destroying the values of any architectural registers.

You couldn't just use a regular global variable (absolute address) because that can only have one value that all cores would read. You also don't have any spare registers (they all contain precious user-space state that you'll need to restore later), and you don't have a kernel stack to push them on. And you can't use the user-space RSP; running push in kernel mode with the user-space RSP would let user-space crash the kernel by having RSP point somewhere invalid before running syscall.

When x86-64 was originally being designed (back in 2000, years before the first silicon), this mailing list message explained the intended purpose of swapgs. It was revised a day later after OS devs noticed a problem with how AMD had specced it, but the original email contains a simple example that still applies:

Example usage
At a kernel entry point the OS can use SwapGS to obtain a pointer to kernel data structures and simultaneously save the user's GS base. Upon exit it can use SwapGS to restore the user's GS base:

  SystemCallEntryPoint:
    SwapGS                        ; set up kernel pointer, save user's GS base    
    mov gs:[SavedUserRSP], rsp    ; save user's stack pointer
    mov rsp, gs:[KernelStackPtr]  ; set up kernel stack
    push rax                      ; now that we have a stack, save user's GPRs    
    mov rax, gs:[CPUnumber]       ; get CPU number     < or whatever >
    .                             ; perform system service
    .
    SwapGS                        ; restore user's GS, save kernel pointer

You might also want to look at how the Linux kernel uses it in its syscall entry point, preferably in older kernels before Spectre / Meltdown mitigation complicated everything. e.g. Linux 4.12's entry_64.S has its ENTRY(entry_SYSCALL_64) start with swapgs, very much like AMD's example.

(See also Why does Windows64 use a different calling convention from all other OSes on x86-64? for some explanation of what happens in other Linux kernel entry points, from int 0x80).

Some of the comments in the Linux kernel source point out that it can be inconvenient to make sure swapgs runs exactly once along every path of execution out of the kernel. If there were two opcodes, one for "swap to user gs" and one for "swap to kernel gs", it would be easier to make sure you don't accidentally swap an extra time. That error would leave the next kernel entry looking in the wrong place. (And give user-space the wrong gs, but in GNU/Linux fs is used for thread-local storage.)