Search code examples
linux-kernelx86low-levelcontext-switch

saving general purpose registers in switch_to() in linux 2.6


I saw the code of switch_to in the article "Evolution of the x86 context switch in Linux" in the link https://www.maizure.org/projects/evolution_x86_context_switch_linux/

Most versions of switch_to only save/restore ESP/RSP and/or EBP/RBP, not other call-preserved registers in the inline asm. But the Linux 2.2.0 version does save them in this function, because it uses software context switching instead of relying on hardware TSS stuff. Later Linux versions still do software context switching, but don't have these push / pop instructions.

Are the registers are saved in other function (maybe in the schedule() function)? Or is there no need to save these registers in the kernel context?

(I know that those registers of the user context are saved in the kernel stack when the system enters kernel mode).


Solution

  • Linux versions before 2.2.0 use hardware task switching, where the TSS saves/restores registers for you. That's what the "ljmp %0\n\t" is doing. (ljmp is AT&T syntax for a far jmp, presumably to a task gate). I'm not really familiar with hardware TSS stuff because it's not very relevant; it's still used in modern kernels for getting RSP pointing to the kernel stack for interrupt handlers, but not for context switching between tasks.

    Hardware task switching is slow, so later kernels avoid it. Linux 2.2 does save/restore the call-preserved registers manually, with push/pop before/after swapping stacks. EAX, EDX, and ECX are declared as dummy outputs ("=a" (eax), "=d" (edx), "=c" (ecx)) so the compiler knows that the old values of those registers are no longer available.

    This is a sensible choice because switch_to is probably used inside a non-inline function. The caller will make a function call that eventually returns (after running another task for a while) with the call-preserved registers restored, and the call-clobbered registers clobbered, just like a regular function call. (So compiler code-gen for the function that uses the switch_to macro doesn't need to emit save/restore code outside of the inline asm). If you think about writing a whole context switch function in asm (not inline asm), you'd get this clobbering of volatile registers for free because callers expect that.

    So how do later kernels avoid saving/restoring those registers in inline asm?

    Linux 2.4 uses "=b" (last) as an output operand, so the compiler has to save/restore EBX in a function that uses this asm. The asm still saves/restores ESI, EDI, and EBP (as well as ESP). The text of the article notes this:

    The 2.4 kernel context switch brings a few minor changes: EBX is no longer pushed/popped, but it is now included in the output of the inline assembly. We have a new input argument.

    I don't see where they tell the compiler about EAX, ECX, and EDX not surviving, so that's odd. It might be a bug that they get away with by making the function noinline or something?

    Linux 2.6 on i386 uses more output operands that get the compiler to handle the save/restore.

    But Linux 2.6 for x86-64 introduces the trick that hands off the save/restore to the compiler easily: #define __EXTRA_CLOBBER ,"rcx","rbx","rdx","r8","r9","r10", "r11","r12","r13","r14","r15"

    Notice the clobbers declaration: : "memory", "cc" __EXTRA_CLOBBER

    This tells the compiler that the inline asm destroys all those registers, so the compiler will emit instructions to save/restore these registers at the start/end of whatever function switch_to ultimately inlines into.

    Telling the compiler that all the registers are destroyed after a context switch solves the same problem as manually saving/restoring them with inline asm. The compiler will still make a function that obeys the calling convention.

    The context-switch swaps to the new task's stack, so the compiler-generated save/restore code is always running with the appropriate stack pointer. Notice that the explicit push/pop instructions inside the inline asm int Linux 2.2 and 2.4 are before / after everything else.