Search code examples
c++11gdbcoredump

GDB not showing symbols of stripped core files even if a non-stripped version is given


I built the program by classic configure, make, make install. Some months after, the program crashed. I still have the build directory where both the source and the non-stripped executable reside. From there, I call gdb like so:

530-north:courier$ gdb -q --core /tmp/core_epoch\=1667475742_pid\=23653_file\=\!usr\!local\!libexec\!courier\!courierd courierd
Reading symbols from courierd...
[New LWP 23653]
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
Core was generated by `/usr/local/libexec/courier/courierd'.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x0000561e841e5afd in msgq::completed(drvinfo&, unsigned long) ()
(gdb) info args
No symbol table info available.

With bt I can see a long sequence of calls between two functions:

#0  0x0000561e841e5afd in msgq::completed(drvinfo&, unsigned long) ()
#1  0x0000561e841e609a in msgq::startdelivery(drvinfo*, delinfo*) ()
#2  0x0000561e841e5bd8 in msgq::completed(drvinfo&, unsigned long) ()
#3  0x0000561e841e609a in msgq::startdelivery(drvinfo*, delinfo*) ()
#4  0x0000561e841e5bd8 in msgq::completed(drvinfo&, unsigned long) ()
...
#204 0x0000561e841e5a17 in msgq::completed(drvinfo&, unsigned long) ()
#205 0x0000561e841e609a in msgq::startdelivery(drvinfo*, delinfo*) ()
#206 0x0000561e841e5a17 in msgq::completed(drvinfo&, unsigned long) ()
#207 0x0000561e841e70fe in courierbmain() ()
#208 0x0000561e841dd030 in main ()

Every couple of calls advances the stack by 0x110, for a total of ~27Kb, which is much less of the running processes' allocated 132Kb of stack, so it's not stack overflow. SIGSEGV could be from a null pointer or whatever. Why doesn't gdb point at it? This is GNU gdb (Debian 10.1-1.7) 10.1.90.20210103-git, BTW.

If I omit the last argument to gdb, bt doesn't show the function names. Did I screw up compilation? On config.log I see I had 'CFLAGS= -march=nocona -O2 -g' 'LDFLAGS= -march=nocona -O2' 'CXXFLAGS= -march=nocona -O2 -std=c++11'. The source file is C++. Perhaps I missed some -gs? Yet, some symbols are there...


Solution

  • Why doesn't gdb point at it?

    Because you haven't compiled your program with appropriate debug info.

    You'll have to debug this crash at the assembly level. Start with disasemble $pc and info registers.

    The source file is C++. Perhaps I missed some -gs?

    Yes: your CXXFLAGS don't have -g.

    Yet, some symbols are there...

    On UNIX systems (unlike Windows), function names (symbols) are present (by default) even without -g. There is no contradiction here.

    Update:

    However, if I don't pass the non-stripped file as argument, the function names are not displayed.

    Yes: strip removes the symbols and debug info.

    You can observe this by using a trivial test:

    // t.cc
    #include <cstdlib>
    
    struct S {
      void fn() { abort(); }
    };
    
    int main()
    {
      S().fn();
    }
    

    First let's see how it works when the binary is built correctly for debugging:

    g++ -g t.cc -o a.out && strip ./a.out -o a.out.stripped &&
    ./a.out.stripped; gdb -q --batch -ex where ./a.out core
    Aborted (core dumped)
    ...
    warning: core file may not match specified executable file.
    [New LWP 476070]
    Core was generated by `./a.out.stripped'.
    Program terminated with signal SIGABRT, Aborted.
    #0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
    44      ./nptl/pthread_kill.c: No such file or directory.
    #0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
    #1  0x00007f12444895df in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at ./nptl/pthread_kill.c:89
    #2  __GI___pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ./nptl/pthread_kill.c:89
    #3  0x00007f12445f5e70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #4  0x00007f1244428469 in __GI_abort () at ./stdlib/abort.c:79
    #5  0x000055de28a24165 in S::fn (this=0x7ffcd0d1d80f) at t.cc:4
    #6  0x000055de28a2414d in main () at t.cc:9
    

    Note presence of file/line info and function names. If we use the stripped version, neither is present:

    ore was generated by `./a.out.stripped'.
    Program terminated with signal SIGABRT, Aborted.
    #0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
    44      ./nptl/pthread_kill.c: No such file or directory.
    #0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
    #1  0x00007f12444895df in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at ./nptl/pthread_kill.c:89
    #2  __GI___pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ./nptl/pthread_kill.c:89
    #3  0x00007f12445f5e70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #4  0x00007f1244428469 in __GI_abort () at ./stdlib/abort.c:79
    #5  0x000055de28a24165 in ?? ()
    #6  0x000055de28a2414d in ?? ()
    #7  0x00007f124442920a in __libc_start_call_main (main=main@entry=0x55de28a24139, argc=argc@entry=1, argv=argv@entry=0x7ffcd0d1d928) at ../sysdeps/nptl/libc_start_call_main.h:58
    #8  0x00007f12444292bc in __libc_start_main_impl (main=0x55de28a24139, argc=1, argv=0x7ffcd0d1d928, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7ffcd0d1d918) at ../csu/libc-start.c:389
    #9  0x000055de28a24071 in ?? ()
    

    Now let's repeat with incorrectly built binary (which is what you have):

    g++ t.cc -o b.out && strip ./b.out -o b.out.stripped &&
    ./b.out.stripped; gdb -q --batch -ex where ./b.out core
    Aborted (core dumped)
    ...
    warning: core file may not match specified executable file.
    [New LWP 478614]
    Core was generated by `./b.out.stripped'.
    Program terminated with signal SIGABRT, Aborted.
    #0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
    44      ./nptl/pthread_kill.c: No such file or directory.
    #0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
    #1  0x00007f21a0a895df in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at ./nptl/pthread_kill.c:89
    #2  __GI___pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ./nptl/pthread_kill.c:89
    #3  0x00007f21a0bf5e70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #4  0x00007f21a0a28469 in __GI_abort () at ./stdlib/abort.c:79
    #5  0x000056049b052165 in S::fn() ()
    #6  0x000056049b05214d in main ()
    

    Notice presence of function names (S::fn(), main) but lack of file/line/argument info. This matches your observed result.

    If you try again with b.out.stripped, you'll get the same result as you had from previous run with a.out.stripped:

    Core was generated by `./b.out.stripped'.
    Program terminated with signal SIGABRT, Aborted.
    #0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
    44      ./nptl/pthread_kill.c: No such file or directory.
    #0  __pthread_kill_implementation (threadid=<optimized out>, signo=6, no_tid=no_tid@entry=0) at ./nptl/pthread_kill.c:44
    #1  0x00007f21a0a895df in __pthread_kill_internal (signo=<optimized out>, threadid=<optimized out>) at ./nptl/pthread_kill.c:89
    #2  __GI___pthread_kill (threadid=<optimized out>, signo=<optimized out>) at ./nptl/pthread_kill.c:89
    #3  0x00007f21a0bf5e70 in ?? () from /lib/x86_64-linux-gnu/libc.so.6
    #4  0x00007f21a0a28469 in __GI_abort () at ./stdlib/abort.c:79
    #5  0x000056049b052165 in ?? ()
    #6  0x000056049b05214d in ?? ()
    #7  0x00007f21a0a2920a in __libc_start_call_main (main=main@entry=0x56049b052139, argc=argc@entry=1, argv=argv@entry=0x7fff3554bc78) at ../sysdeps/nptl/libc_start_call_main.h:58
    #8  0x00007f21a0a292bc in __libc_start_main_impl (main=0x56049b052139, argc=1, argv=0x7fff3554bc78, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fff3554bc68) at ../csu/libc-start.c:389
    #9  0x000056049b052071 in ?? ()
    

    In addition, readelf --debug-dump=info courierd shows lots of Version 4 stuff.

    Yes, if you run readelf --debug-dump b.out, you could observe a lot of DWARF4 stuff coming from crt0.o, crtbegin.o, etc (depending on how your GCC and GLIBC were built).

    If you have .c files linked in, these will also have DWARF4 debug info, since your CFLAGS do include -g.

    But none of the DWARF4 stuff will be coming from wherever msgq::completed is defined.