Code generation for local stack variables in release vs. debug mode

As part of my OS in Rust, I have the following system call entry point:

#[no_mangle]
#[naked]
#[inline(never)]
unsafe extern "C" fn syscall_handler() {

    // switch to the kernel stack dedicated for syscall handling, and save the user task's details
    asm!("swapgs; \
          mov gs:[0x8],  rsp; \
          mov gs:[0x10], rcx; \
          mov gs:[0x18], r11; \
          mov rsp, gs:[0x0];"
          : : : "memory" : "intel", "volatile");


    let (rax, rdi, rsi, rdx, r10, r8, r9): (u64, u64, u64, u64, u64, u64, u64); 
    asm!("" : "={rax}"(rax), "={rdi}"(rdi), "={rsi}"(rsi), "={rdx}"(rdx), "={r10}"(r10), "={r8}"(r8), "={r9}"(r9)  : : "memory" : "intel", "volatile");
    // do stuff with rax, rdi, rsi...

This works fine in debug mode, and in release mode (with debug info enabled) because it generates assembly code that stores the local stack variables like rdi, rsi, etc at negative offsets from the base pointer rbp. For example, here's the generated code:

<syscall_handler>:
swapgs 
mov    %rsp,%gs:0x8
mov    %rcx,%gs:0x10
mov    %r11,%gs:0x18
mov    %gs:0x0,%rsp
mov    %rax,-0x1f0(%rbp)
mov    %rdi,-0x1e8(%rbp)
mov    %rsi,-0x1e0(%rbp)
mov    %rdx,-0x1d8(%rbp)
mov    %r10,-0x1d0(%rbp)
mov    %r8,-0x1c8(%rbp)
mov    %r9,-0x1c0(%rbp)
movb   $0x4,-0x1b1(%rbp)

That code works fine, because my syscall handler runs with a stack pointer that points to the top of the current kernel stack (as usual), meaning that it's okay to use negative offsets from the stack pointer / base pointer (base pointer rbp is set before this based on the stack pointer value).

When I build in release mode without debug info, it generates code that uses positive offsets from the stack pointer itself (rsp, not the base pointer) as locations for the local stack variables. This is really weird and causes a problem because the memory above the current stack pointer rsp is out of bounds.

Here's the code generated in pure release mode without debug info:

<syscall_handler>:
swapgs 
mov    %rsp,%gs:0x8
mov    %rcx,%gs:0x10
mov    %r11,%gs:0x18
mov    %gs:0x0,%rsp
mov    %rax,0x1c0(%rsp)
mov    %rdi,0x1c8(%rsp)
 mov    %rsi,0x1d0(%rsp)
mov    %rdx,0x1d8(%rsp)
mov    %r10,0x1e0(%rsp)
mov    %r8,0x1e8(%rsp)
mov    %r9,0x1f0(%rsp)

Why is this code being generated, code that uses a positive offset from the stack pointer? That strikes me as very strange.

Is there any way to avoid that or change the code generation somehow?

Solution

The stack grows down. Positive offsets from RSP are the part that's safe from async modification by interrupts, i.e. "reserved".

Negative offsets from RSP would be the red zone, which you can't have on the kernel stack.

Use sub rsp, 0x100 or whatever to reserve enough space for your naked function's local variables. Or better, write the whole entry point in asm instead of messing around with compiler-generated code for this.

Or better, just use push yourself, it's more compact (code-size) and just as efficient. push works very well for saving registers on the stack; Linux's system call entry points use that. (e.g. the entry point into an x86-64 kernel from syscall in 64-bit user-space uses push to save all the registers, from Linux 4.12 (before the Spectre / Meltdown mitigation/workaround patches made the entry points more complicated).

The reason it's weird / confusing is that you asked for a naked function (thus no function prologue to reserve stack space), but then you used local variables inside it anyway. Otherwise the compiler would sub rsp, 0x... on its own to reserve enough space for the locals before accessing them.

I think in some C / C++ compilers that support naked functions, that's not supported; only inline asm as the entire function body is allowed. But IDK what Rust says is officially supported as far as weird tangos between inline asm and the compiler. Like I said, you wouldn't have these problems if you wrote the entry point in pure asm.

Your debug-mode version appears broken; you're storing relative to RBP, but RBP hasn't been set yet. You asked for a naked function, so you'd need to mov rbp, rsp yourself (after loading RSP from gs:0), and then sub rsp, 0x20 or whatever to reserve enough space in the stack frame for those negative offsets.

I think your debug-mode version is storing relative to user-space's RBP, which will break horribly if user-space makes a system call with RBP pointing near anything that shouldn't be clobbered, and even more horribly if RBP holds a non-pointer value.

(If you had just been using mov rbp, rsp or something, as per your comment that you left this part out, then you were using space below RSP, which isn't safe without a red-zone.)