Search code examples
c++segmentation-faultstack-tracestrlenbacktrace

calling backtrace_symbols_fd() sometimes hangs during call of strlen in snprintf


I'm trying to catch faults with a signal handler and then print stack trace information to add to a log file (or console) for crash reports and debugging my application on non-development machines. My problem is that occasionally I'm not getting a full stack frame backtrace. It appears to hang and not finish or exit in many cases. Only sometimes does it successfully exit.

Here is my code:

#include <signal.h>
#include <stdlib.h>
#include <stdio.h>
#include <execinfo.h>

typedef struct { char name[10]; int id; char description[40]; } signal_def;

signal_def signal_data[] =
{
    { "SIGHUP", SIGHUP, "Hangup (POSIX)" },
    { "SIGINT", SIGINT, "Interrupt (ANSI)" },
    { "SIGQUIT", SIGQUIT, "Quit (POSIX)" },
    { "SIGILL", SIGILL, "Illegal instruction (ANSI)" },
    { "SIGTRAP", SIGTRAP, "Trace trap (POSIX)" },
    { "SIGABRT", SIGABRT, "Abort (ANSI)" },
    { "SIGIOT", SIGIOT, "IOT trap (4.2 BSD)" },
    { "SIGBUS", SIGBUS, "BUS error (4.2 BSD)" },
    { "SIGFPE", SIGFPE, "Floating-point exception (ANSI)" },
    { "SIGKILL", SIGKILL, "Kill, unblockable (POSIX)" },
    { "SIGUSR1", SIGUSR1, "User-defined signal 1 (POSIX)" },
    { "SIGSEGV", SIGSEGV, "Segmentation violation (ANSI)" },
    { "SIGUSR2", SIGUSR2, "User-defined signal 2 (POSIX)" },
    { "SIGPIPE", SIGPIPE, "Broken pipe (POSIX)" },
    { "SIGALRM", SIGALRM, "Alarm clock (POSIX)" },
    { "SIGTERM", SIGTERM, "Termination (ANSI)" },
    //{ "SIGSTKFLT", SIGSTKFLT, "Stack fault" },
    { "SIGCHLD", SIGCHLD, "Child status has changed (POSIX)" },
    //{ "SIGCLD", SIGCLD, "Same as SIGCHLD (System V)" },
    { "SIGCONT", SIGCONT, "Continue (POSIX)" },
    { "SIGSTOP", SIGSTOP, "Stop, unblockable (POSIX)" },
    { "SIGTSTP", SIGTSTP, "Keyboard stop (POSIX)" },
    { "SIGTTIN", SIGTTIN, "Background read from tty (POSIX)" },
    { "SIGTTOU", SIGTTOU, "Background write to tty (POSIX)" },
    { "SIGURG", SIGURG, "Urgent condition on socket (4.2 BSD)" },
    { "SIGXCPU", SIGXCPU, "CPU limit exceeded (4.2 BSD)" },
    { "SIGXFSZ", SIGXFSZ, "File size limit exceeded (4.2 BSD)" },
    { "SIGVTALRM", SIGVTALRM, "Virtual alarm clock (4.2 BSD)" },
    { "SIGPROF", SIGPROF, "Profiling alarm clock (4.2 BSD)" },
    { "SIGWINCH", SIGWINCH, "Window size change (4.3 BSD, Sun)" },
    { "SIGIO", SIGIO, "I/O now possible (4.2 BSD)" },
    //{ "SIGPOLL", SIGPOLL, "Pollable event occurred (System V)" },
    //{ "SIGPWR", SIGPWR, "Power failure restart (System V)" },
    { "SIGSYS", SIGSYS, "Bad system call" },
};

void bt_sighandler(int sig, siginfo_t *info, void *secret) {
   signal_def *sigd = NULL;
       for (int i = 0; i < sizeof(signal_data) / sizeof(signal_def); ++i) {
          if (sig == signal_data[i].id) {
             sigd = &signal_data[i];
             break;
          }
       }
   //ucontext_t* uc = (ucontext_t*) secret;
   //void *pnt = (void*) uc->uc_mcontext.gregs[REG_RIP] ;

   void *trace[16];
   int trace_size = backtrace(trace, 16);
   /* overwrite sigaction with caller's address */
   //trace[1] = pnt;

   if (sigd) {
       fprintf(stderr, "SigHandler(0x%02X)[%d]:%s[%s]", sig, trace_size,
          sigd->name, sigd->description);
       } else {
       fprintf(stderr, "SigHandler(0x%02X)[%d]", sig, trace_size);
       }

   backtrace_symbols_fd(trace, trace_size, fileno(stderr));

   exit(1);
}

#endif

int main(int argc, char* argv[]) {
  struct sigaction sa;

  sa.sa_sigaction = bt_sighandler;
  sigemptyset(&sa.sa_mask);
  sa.sa_flags = 0;

  sigaction(SIGINT, &sa, NULL);
  sigaction(SIGSEGV, &sa, NULL);
  sigaction(SIGBUS, &sa, NULL);
  sigaction(SIGILL, &sa, NULL);
  sigaction(SIGFPE, &sa, NULL);
  sigaction(SIGUSR1, &sa, NULL);
  sigaction(SIGUSR2, &sa, NULL);

  signal(SIGPIPE, SIG_IGN);

  //Produce a fault

  return 0;
}

You'll notice in my sample code that the section responsible for overwriting the sigaction with the caller's address has been commented out. This is because I'm uncertain how to get it to compile for Mac.

Here is a sample console output: console output http://www.minesclubtennis.com/images/stackoverflow/fatalconsoleoutputhang.png

You'll notice that it only printed the first 3 frames and then hung without exiting even though 9 frames were found and supposed to be printed.

So I did a "Sample Process" from the Activity Monitor app and found that the thread executing the backtrace_symbols_fd function was stuck on strlen. Screenshot: sample process output http://www.minesclubtennis.com/images/stackoverflow/sampleprocessoutputhang.png

Why is it hanging? Is this a bug in my own code or a bug within Apple's backtrace? I've been told that there are limited things one can do with a signal handler but I don't see anything on the sigaction man page that would indicate what I'm doing wrong.


Solution

  • You need to read the sigaction man page more closely! Anything not listed in the signal safe list of functions is verboten in a signal handler. backtrace_symbols_fd() is not in that list. You can't use it in a signal handler.

    If you want to see exactly why, go to Apple's open source site and download the Libc code. Your capture illustrates where the problem is. If you look at "stdio/vprintf-fbsd.c" you'll see that __vfprintf() has this comment:

    /*
     * Non-MT-safe version
     */
    

    A lot of printf style functions end up here (snprintf is how we got here). If your app crashses in a printf style function and the signal handler tries to re-enter, then the unexpected behavior you're seeing is... expected.

    Or even if your app doesn't crash in a printf style function, but some other thread happens to be in a printf style function when it crashes, you could see this behavior.