Search code examples
linuxmacosglibcbacktrace

Backtrace info different on macOS v.s. Linux


This is the test code in C++ (adapted from a StackOverflow post but I couldn't find it):

#include <signal.h>    
#include <stdio.h>
#include <stdlib.h>

#include <execinfo.h>
#include <unistd.h>

void handler(int sig) {
  void *array[10];
  size_t size;

  // get void*'s for all entries on the stack
  size = backtrace(array, 10);

  // print out all the frames to stderr
  fprintf(stderr, "Error: signal %d:\n", sig);
  backtrace_symbols_fd(array, size, STDERR_FILENO);
  exit(1);
}

void baz() {
  int *foo = (int*)-1; // make a bad pointer
  printf("%d\n", *foo);       // causes segfault
}

void bar() { baz(); }
void foo() { bar(); }


int main(int argc, char **argv) {
  signal(SIGSEGV, handler);   // install our handler
  foo(); // this will call foo, bar, and baz.  baz segfaults.
}

This is the compiling and linking step (on macOS, g++ is acutually clang++):

g++ backtrace_example.cc -c -O0
g++ -rdynamic backtrace_example.o -o bt_example

This is the backtrace info printed on macOS:

$ ./bt_example
Error: signal 11:
0   bt_example                          0x000000010dd39dbf handler + 31
1   libsystem_platform.dylib            0x00007fff5e0b7b3d _sigtramp + 29
2   ???                                 0x0000000117f6a7c7 0x0 + 4697008071
3   bt_example                          0x000000010dd39eb9 _Z3barv + 9
4   bt_example                          0x000000010dd39ec9 _Z3foov + 9
5   bt_example                          0x000000010dd39efe main + 46
6   libdyld.dylib                       0x00007fff5dece085 start + 1
7   ???                                 0x0000000000000001 0x0 + 1

This is the backtrace info printed on Linux:

$ ./bt_example
Error: signal 11:
./bt_example(handler+0x2b)[0x400982]
/lib/x86_64-linux-gnu/libc.so.6(+0x354b0)[0x7f2aefc534b0]
./bt_example(_Z3bazv+0x14)[0x4009db]
./bt_example(_Z3barv+0x9)[0x4009fa]
./bt_example(_Z3foov+0x9)[0x400a06]
./bt_example(main+0x23)[0x400a2c]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xf0)[0x7f2aefc3e830]
./bt_example(_start+0x29)[0x4008a9]

Apart from the differences in formats, there is a key difference: the macOS version does not mention function baz() at all, but clearly the segfault was caused by baz().

Why? Is this a flaw on macOS's implementation of backtrace_symbol_fd(), or is this intended (if so, how)?

Reference: GNU documentation on glibc's backtraces. macOS is a different OS and has its own C library, but it has the same backtracing API in this regard since 10.5.


Solution

  • First, none of the functions you call in your handler are valid to call from a signal handler. Therefore, you're completely in undefined behavior territory.

    Second, this isn't so much a flaw in backtrace as it's just a difference in signal-handling mechanisms in the two OSes. macOS alters the stack temporarily while it calls the signal handler. It has saved sufficient context information to restore it if the signal handler returns (and the handler even has the opportunity to modify it). You can see the implementation here.