I have this following code which goes into infinite recursion and triggers a seg fault when it exhausts the stack limit allocated to it. I am trying to capture this segmentation fault and exit gracefully. However, I was not able to catch this segmentation fault in any of the signal numbers.
(A customer is facing this issue and wants a solution for such a use-case. Increasing the stack size by something like "limit stacksize 128M" makes his test pass. However, he is asking for a graceful exit rather than a seg fault. The following code simply reproduces the actual issue not what the actual algorithm does).
Any help is appreciated. If something is incorrect in the way I am trying to catch the signal please let me know that too. To compile: g++ test.cc -std=c++0x
#include <iostream>
#include <signal.h>
#include <stdio.h>
#include <stdlib.h>
#include <string>
#include <string.h>
int recurse_and_crash (int val)
{
// Print rough call stack depth at intervals.
if ((val %1000) == 0)
{
std::cout << "\nval: " << val;
}
return val + recurse_and_crash (val+1);
}
void signal_handler(int signal, siginfo_t * si, void * arg)
{
std::cout << "Caught segfault\n";
exit(0);
}
int main(int argc, char ** argv)
{
int signal = 11; // SIGSEGV
if (argc == 2)
{
signal = std::stoi(std::string(argv[1]));
}
struct sigaction sa;
memset(&sa, 0, sizeof(struct sigaction));
sigemptyset(&sa.sa_mask);
sa.sa_sigaction = signal_handler;
sa.sa_flags = SA_SIGINFO;
sigaction(signal, &sa, NULL);
recurse_and_crash (1);
}
This is a surprisingly complex problem to solve. I will at this point not give working code, but rather focus on a few "nifty" issues that you have - or, as you continue coding for this - will encounter.
First, why are you recursing ?
The reason for that is that while signal handlers are "execution context transfers", by default they do not have their own stack. That means if you receive a signal as a consequence of an overflown stack, the signal handler will attempt to allocate space-on-the-stack for context potentially passed to it - and that simply re-throws the same signal again.
To make sure signal handlers run on their own separate / preallocated stack, use sigaltstack()
and the SA_ONSTACK
flag for sigaction()
.
Second, depending on "how badly" the stack overruns (your test program may not trigger this but a real world program may), the memory access (attempt) that's "the overflow-effecting action" may end up with other signals but SIGSEGV
.
Your example "unspecifically" catches all signals, but that may in practice be rather insufficient / rather confusing - you sending your app a SIGUSR1
or the shell/terminal sending it a SIGTTOU
on being backgrounded are absolutely not indicative of a stackoverflow.
This means there's another issue - which signals are to be expected when making an "out of stack" memory access as consequence of a stack overflow ? And how can you know that a specific signal you got was due to a stack access ?
The answer to that again is more complex than first sight:
SIGSEGV
.SIGBUS
instead.SIGSEGV
or SIGBUS
(For example, on x86, certain instructions raise #GP
while others #PF
- for the same mem address read/write - and the Linux kernel translates one possibly to SIGBUS
the other to SIGSEGV
)char local_to_blow_stack[1ULL << 40]; memset(&local_to_blow_stack, 0, 1);
) and just-so-as-it-happens something else valid is at "whatever your stack is minus a terabyte"), that access will in fact just-work. Without the compiler to create you "assist" code to identify such accesses, it's actually possible you've blown the stack and still make a number of successful / non-signaling memory accesses before eventually hitting a mem region triggering a signal.So "just catching signals", even "catching all signals that may possibly occur as a consequence of a stack overflow" is insufficient. You need, within the signal handler to decode the memory access location, and possibly the operation / cpu instruction, to verify that the memory access attempted actually was a "stack access out of bounds". It is possible for a thread to retrieve its own stack boundaries - https://man7.org/linux/man-pages/man3/pthread_getattr_np.3.html can be used for this, at least on Linux (_np
implies 'non portable' - this isn't guaranteed to be available on all systems, others may have different interfaces to retrieve this information) - but ... to find the memory location that was accessed depends on the signal and accessing instruction again. Often (but not always) it's in the siginfo
(the si_addr
) field.
From what I remember, exactly which signals fill si_addr
under exactly what circumstances, and whether the address in there is e.g. the instruction issuing the memory access or the memory location of the attempted access, is somewhat system- and hardware-dependent (Linux may behave differently from Windows or MacOSX, and different on ARM than on x86)
So you would also need to validate that "the si_addr
in this siginfo_t
is somewhere-near the signaled thread's stack", but possibly also validate that the instruction that caused it was actually a memory access / si_addr
can be "traced back" to the instruction that faulted. That (finding the faulting instruction's address / the program counter) ... requires decoding the other argument for the signal handler, the ucontext_t
... and there you're deep deep deep [ recurse infinity here ] in HW / OS specifics.
At this point I'd like to terminate; a "simple" but not perfect solution just needs an alternate signal stack, and the handler to retrieve the current stack boundaries via pthread_getattr_np()
, to compare the si_addr
against. If your life or that of others depends on the correct answer, remember the above though.