Hi I'm a kernel learner and have some questions about swapgs.
According to AMD's documentation, it swaps the gs.base
hidden register and KernelGSBase MSR.
Furthermore, the addressing with "gs:XXXX" are calculated as "gs.base + base + (scale*index) + displacement"
Now my first question is:
Then where I should store the "base" and "scale"?
Furthermore, where I should use it, my current project put upper half of the virtual memory space as kernel, and the compiler will not usually add "gs:XXXX" as addressing reference.
So where, in particular, I should use the swapgs
instruction.
There is no hidden "base and scale", only a hidden gs.base
that you use with a normal addressing mode. (And a hidden value for the GS register itself. That's a selector value, which would act as an index into the GDT if you actually did mov gs, eax
instead of modifying just the GS base via an MSR, or via wrgsbase
. But that's not related the the index part of the offset in a full gs:[base + index*scale]
addressing mode).
You use swapgs
in the kernel's syscall
entry point handler, then use GS segment overrides on some loads and stores like you would for thread-local storage, so the previously-hidden gs.base
is used with the [base + idx*scale]
addressing mode you use in each load or store instruction. e.g. something like mov [gs:0x10], rsp
to save the user-space stack pointer and mov rsp, [gs:0x18]
to load the kernel stack pointer.
swapgs
exists because syscall
doesn't change RSP to point at the kernel stack (and doesn't save the user-space RSP anywhere). So you need some kind of thread-local (or actually core-local) storage so each core can get the right kernel stack pointer for the task running on that core. The hidden GS base is storage for that hidden pointer, and a way to use it without destroying the values of any architectural registers.
You couldn't just use a regular global variable (absolute address) because that can only have one value that all cores would read. You also don't have any spare registers (they all contain precious user-space state that you'll need to restore later), and you don't have a kernel stack to push them on. And you can't use the user-space RSP; running push
in kernel mode with the user-space RSP would let user-space crash the kernel by having RSP point somewhere invalid before running syscall
.
When x86-64 was originally being designed (back in 2000, years before the first silicon), this mailing list message explained the intended purpose of swapgs
. It was revised a day later after OS devs noticed a problem with how AMD had specced it, but the original email contains a simple example that still applies:
Example usage
At a kernel entry point the OS can use SwapGS to obtain a pointer to kernel data structures and simultaneously save the user's GS base. Upon exit it can use SwapGS to restore the user's GS base:SystemCallEntryPoint: SwapGS ; set up kernel pointer, save user's GS base mov gs:[SavedUserRSP], rsp ; save user's stack pointer mov rsp, gs:[KernelStackPtr] ; set up kernel stack push rax ; now that we have a stack, save user's GPRs mov rax, gs:[CPUnumber] ; get CPU number < or whatever > . ; perform system service . SwapGS ; restore user's GS, save kernel pointer
You might also want to look at how the Linux kernel uses it in its syscall
entry point, preferably in older kernels before Spectre / Meltdown mitigation complicated everything. e.g. Linux 4.12's entry_64.S
has its ENTRY(entry_SYSCALL_64)
start with swapgs
, very much like AMD's example.
(See also Why does Windows64 use a different calling convention from all other OSes on x86-64? for some explanation of what happens in other Linux kernel entry points, from int 0x80
).
Some of the comments in the Linux kernel source point out that it can be inconvenient to make sure swapgs
runs exactly once along every path of execution out of the kernel. If there were two opcodes, one for "swap to user gs" and one for "swap to kernel gs", it would be easier to make sure you don't accidentally swap an extra time. That error would leave the next kernel entry looking in the wrong place. (And give user-space the wrong gs, but in GNU/Linux fs
is used for thread-local storage.)