Search code examples
linux-kernelx86stack-overflowstack-memory

How does the linux kernel avoid the stack overwriting the text (instructions)?


I was curious about how the kernel prevents the stack from growing too big, and I found this Q/A:

Q: how does the linux kernel enforce stack size limits?

A: The kernel can control this due to the virtual memory. The virtual memory (also known as memory mapping), is basically a list of virtual memory areas (base + size) and a target physically memory area that the kernel can manipulate that is unique to each program. When a program tries to access an address that is not on this list, an exception happens. This exception will cause a context switch into kernel mode. The kernel can look up the fault. If the memory is to become valid, it will be put into place before the program can continue (swap and mmap not read from disk yet for instance) or a SEGFAULT can be generated.

In order to decide the stack size limit, the kernel simply manipulates the virtual memory map. - Stian Skjelstad

But I didn't quite find this answer satisfactory. "When a program tries to access an address that is not on this list, an exception happens." - But wouldn't the text section (instructions) of the program be part of the virtual memory map?


Solution

  • I'm asking about how the kernel enforces the stack size of user programs.

    There's a growth limit, set with ulimit -s for the main stack, that will stop the stack from getting anywhere near .text. (And the guard pages below that make sure there's a segfault if the stack does overflow past the growth limit.) See How is Stack memory allocated when using 'push' or 'sub' x86 instructions?. (Or for thread stacks (not the main thread), stack memory is just a normal mmap allocation with no growth; the only lazy allocation is physical pages to back the virtual ones.)

    Also, .text is a read+exec mapping of the executable, so there's no way to modify it without calling mprotect first. (It's a private mapping, so doing so would only affect the pages in memory, not the actual file. This is how text relocations work: runtime fixups for absolute addresses, to be fixed up by the dynamic linker.)

    The actual mechanism for limiting growth is by simply not extending the mapping and allocating a new page when the process triggers a hardware page fault with the stack pointer below the existing stack area. Thus the page fault is an invalid one, instead of a soft aka minor for the normal stack-growth case, so a SIGSEGV is delivered.


    If a program used alloca or a C99 VLA with an unchecked size, malicious input could make it jump over any guard pages and into some other read/write mapping such as .data or stuff that's dynamically allocated.

    To harden buggy code against that so it segfaults instead of actually allowing a stack clash attack, there are compiler options that make it touch every intervening page as the stack grows, so it's certain to set off the "tripwire" in the form of an unmapped guard page below the stack-growth limit. See Linux process stack overrun by local variables (stack guarding)

    If you set ulimit -s unlimited could you maybe grow the stack into some other mapping, if Linux truly does allow unlimited growth in that case without reserving a guard page as you approach another mapping.