Search code examples
assemblyx86-64calling-conventionabimemory-segmentation

Is %gs caller- or callee-save for the System V AMD64 ABI?


Does the System V AMD64 ABI say anything about the segment register %gs? Is it considered caller-save or callee-save? Or is it reserved? Or is nothing yet defined for %gs?


Solution

  • IIRC, the calling convention / ABI doesn't have much to say about it at all. The FS base is used for thread-local storage on x86-64 System V, but anything you want to do with the GS base (and/or the GS segment-selector value) is presumably up to you to define how your program uses it.

    Either in terms of a calling convention, or more normally as set-once early in a thread's lifetime and leave untouched after that. Like how MXCSR and the x87 rounding mode / precision control are handled by having functions leave them untouched except for maybe local changes that are restored before making any further calls. (That is still something you can describe as a calling convention / ABI, but as Nate commented, it's neither call-preserved nor call-clobbered.)

    I think the Linux kernel will save/restore user-space's selector value (and maybe also the segment base separately from whatever's in some GDT or LDT entry selected by a non-0 selector value, instead set via a system call to write the MSR, or via wrgsbase if the kernel enables it for user-space use on HW that supports it.) If so then it would be viable for user-space to use GS for something, like an alternate TLS.

    In practice, you can safely assume that calls to compiler-generated code (or even hand-written) in libraries won't change it. So system calls and context switches are all you need to worry about; I'd recommend testing.

    Note that mov to GS with a value that isn't 0 or a valid selector for an LDT or GDT entry will fault, so there's a very limited set of values you can use.

    It's also quite slow to write GS (Is a mov to a segmentation register slower than a mov to a general purpose register?), although reading it is fairly efficient (except on P4). Perhaps even more efficient than a thread-local-storage memory location, although GS can't be a source operand for instructions other than stores.


    Fun fact: i386 SysV uses GS for TLS; I think x86-64 changed to FS so the kernel-GS (which is special because of swapgs) could just be for finding the task's kernel stack after swapgs, not also having to be the kernel's TLS base for per-core variables.