Search code examples
javagdbsignalsalpine-linuxmusl

gdb debugging OpenJDK java on Alpine Linux fails with "Thread recieved signal ?, Unknown signal"


I'm having a hard time trying to debug OpenJDK java on Alpine Linux using gdb - was anyone successful in doing so?

When trying to debug java in gdb, for example, gdb java and r -version, it instantly fails with:

Thread 1 "java" recieved signal ?, Unknown signal.
__cp_end () at src/thread/x86_64/syscall_cp.s:29

I searched and searched but couldn't find any reference or solution for OpenJDK debugging on Alpine.

Other threads dealing with the same gdb error, seen on other platforms (macOS Sierra, MinGW), suggest that recieved signal ?, Unknown signal could result from various reasons, including a gdb bug, uncaught exception, stack overflow, and other application bugs.

Outside gdb, java is working without any problems, and gdb is working fine for debugging a simple C++ program. I'm running Alpine V3.8.

Things I've tried:

  • Different gdb versions (8.0.1-r6, 8.0.1-r3, 7.12.1-r1).
  • Different OpenJDK versions (1.8.0_171, 1.7.0_181).
  • Running from different shells (/bin/ash, /bin/bash), with and without sudo.
  • Disabling stopping on signals in .gdbinit: handle SIGSEGV nostop noprint pass, and same for SIGPIPE, SIGHUP, SIGFPE, SIG34.
  • Adding set startup-with-shell off to .gdbinit.

Thanks for any help!

Edit:

Here's the full stack where the unknown signal is thrown, which causes JVMInit to fail:

(gdb) r -version
Starting program: /usr/lib/jvm/java-1.8-openjdk/bin/java -version
process 16214 is executing new program: /usr/lib/jvm/java-1.8-openjdk/bin/java
[New LWP 16219]

Thread 1 "java" received signal ?, Unknown signal.
__cp_end () at src/thread/x86_64/syscall_cp.s:29
29  src/thread/x86_64/syscall_cp.s: No such file or directory.
(gdb) info threads
  Id   Target Id         Frame 
* 1    LWP 16214 "java"  __cp_end () at src/thread/x86_64/syscall_cp.s:29
  2    LWP 16219 "java"  __synccall (func=func@entry=0x7ffff7da2662 <do_setrlimit>, ctx=ctx@entry=0x7ffff7ff4720)
    at src/thread/synccall.c:143
(gdb) where
#0  __cp_end () at src/thread/x86_64/syscall_cp.s:29
#1  0x00007ffff7dbed2d in __syscall_cp_c (nr=202, u=<optimized out>, v=<optimized out>, w=<optimized out>, x=<optimized out>, 
    y=<optimized out>, z=0) at src/thread/pthread_cancel.c:35
#2  0x00007ffff7dbe350 in __timedwait_cp (addr=addr@entry=0x7ffff7ff4b20, val=16219, clk=clk@entry=0, at=at@entry=0x0, priv=priv@entry=0)
    at src/thread/__timedwait.c:31
#3  0x00007ffff7dbfdc4 in __pthread_timedjoin_np (t=0x7ffff7ff4ae8, res=res@entry=0x7fffffffa348, at=at@entry=0x0)
    at src/thread/pthread_join.c:16
#4  0x00007ffff7dbfe02 in __pthread_join (t=<optimized out>, res=res@entry=0x7fffffffa348) at src/thread/pthread_join.c:27
#5  0x00007ffff7b6695e in ContinueInNewThread0 (continuation=continuation@entry=0x7ffff7b61a60 <JavaMain>, stack_size=1048576, 
    args=args@entry=0x7fffffffa3e0)
    at /home/buildozer/aports/community/openjdk8/src/icedtea-3.8.0/openjdk/jdk/src/solaris/bin/java_md_solinux.c:1046
#6  0x00007ffff7b634a4 in ContinueInNewThread (ifn=ifn@entry=0x7fffffffa4f0, threadStackSize=<optimized out>, argc=1, 
    argv=<optimized out>, mode=mode@entry=841574793, what=what@entry=0x0, ret=0)
    at /home/buildozer/aports/community/openjdk8/src/icedtea-3.8.0/openjdk/jdk/src/share/bin/java.c:2024
#7  0x00007ffff7b66a08 in JVMInit (ifn=ifn@entry=0x7fffffffa4f0, threadStackSize=<optimized out>, argc=<optimized out>, 
    argv=<optimized out>, mode=841574793, mode@entry=0, what=what@entry=0x0, ret=<optimized out>)
    at /home/buildozer/aports/community/openjdk8/src/icedtea-3.8.0/openjdk/jdk/src/solaris/bin/java_md_solinux.c:1093
#8  0x00007ffff7b63e30 in JLI_Launch (argc=<optimized out>, argv=<optimized out>, jargc=<optimized out>, jargv=<optimized out>, 
    appclassc=1, appclassv=0x0, fullversion=0x555555554843 "1.8.0_171-b11", dotversion=0x55555555483f "1.8", pname=0x55555555483a "java", 
    lname=0x555555554832 "openjdk", javaargs=0 '\000', cpwildcard=1 '\001', javaw=0 '\000', ergo=0)
    at /home/buildozer/aports/community/openjdk8/src/icedtea-3.8.0/openjdk/jdk/src/share/bin/java.c:304
#9  0x0000555555554691 in main (argc=<optimized out>, argv=<optimized out>)
    at /home/buildozer/aports/community/openjdk8/src/icedtea-3.8.0/openjdk/jdk/src/share/bin/main.c:125
(gdb) 

musl source files matching this stack trace:

OpenJDK source code:

JVMInit attempts to create the JavaMain native thread, by calling ContinueInNewThread, which calls ContinueInNewThread0(JavaMain, threadStackSize, (void*)&args), and there it explodes.


Solution

  • TL;DR: The issue is GDB lack of support to internal musl signals, reported in this gdb ticket.

    A quick-and-dirty patched GDB is available here:
    https://github.com/shaharv/alpine-gdb-builds/releases/tag/v0.1

    Patch commit: shaharv/binutils-gdb@0ca9c66.

    With the patch, the signal correctly identifies as SIGSYNCCALL.
    Then, it could be masked using handle SIGSYNCCALL nostop noprint pass.


    Thankfully, I was able to come up with a workaround!
    The gdb crash when debugging Alpine OpenJDK java could be walked around in the following manner:

    • Start gdb
    • break os::init_2
    • Run java with the desired command line arguments
    • When the breakpoint is hit, set MaxFDLimit=0
    • Continue, and debug normally.

    I've tested the workaround with OpenJDK 8 and 11 early access, so it is likely to work with OpenJDK 9 and 10 as well.

    Unfortunately, the scope of this workaround is very limited:

    • It only works if the JDK has debug symbols - whether it's a local debug OpenJDK build or using the openjdk8-dbg debug symbols package.
    • It is only suitable for command line gdb, and won't work with GDB frontends like CLion and Eclipse CDT.

    Summary:

    The crash occurs when the setrlimit function is called inside gdb. musl's setrlimit implementation signals threads with SIGSYNCCALL, which is unsupported by gdb, and results with the Unknown signal error. For avoiding the error, the relevant init code of JavaMain is disabled by turning off the MaxFDLimit global variable.

    Full Explanation:

    During JVM initialization, a JavaMain native thread is created, and creates the VM. During VM creation, there's OS specific initialization in which setrlimit is called. Here's the relevant part of the stack trace:

    #0  __synccall (func=func@entry=0x7ffff7da2662 <do_setrlimit>, ctx=ctx@entry=0x7ffff7ff4720) at src/thread/synccall.c:48
    #1  0x00007ffff7da26a1 in setrlimit (resource=resource@entry=7, rlim=rlim@entry=0x7ffff7ff4750) at src/misc/setrlimit.c:42
    #2  0x00007ffff73bd1fe in os::init_2 ()
        at /home/buildozer/aports/community/openjdk8/src/icedtea-3.8.0/openjdk/hotspot/src/os/linux/vm/os_linux.cpp:5096
    #3  0x00007ffff746177d in Threads::create_vm (args=0x7ffff7ff4a20, canTryAgain=canTryAgain@entry=0x7ffff7ff4987)
        at /home/buildozer/aports/community/openjdk8/src/icedtea-3.8.0/openjdk/hotspot/src/share/vm/runtime/thread.cpp:3361
    #4  0x00007ffff729cd48 in JNI_CreateJavaVM (vm=0x7ffff7ff4a10, penv=0x7ffff7ff4a18, args=<optimized out>)
        at /home/buildozer/aports/community/openjdk8/src/icedtea-3.8.0/openjdk/hotspot/src/share/vm/prims/jni.cpp:5221
    #5  0x00007ffff7b61b0b in InitializeJVM (ifn=<synthetic pointer>, penv=0x7ffff7ff4a18, pvm=0x7ffff7ff4a10)
        at /home/buildozer/aports/community/openjdk8/src/icedtea-3.8.0/openjdk/jdk/src/share/bin/java.c:1231
    #6  JavaMain (_args=<optimized out>)
    

    The culprint is the setrlimit function call. musl's setrlimit implementation is AS-Safe, meaning, it's safe to call it from asynchronous signal handlers. The synchronization part is being handled by calling __synccall (setrlimit.c):

    int setrlimit(int resource, const struct rlimit *rlim)
    {
        struct ctx c = { .res = resource, .rlim = rlim, .err = -1 };
        __synccall(do_setrlimit, &c);
        if (c.err) {
            if (c.err>0) errno = c.err;
            return -1;
        }
        return 0;
    }
    

    __synccall (synccall.c) blocks all signals, then iterates all threads of the process and sends them the SIGSYNCCALL signal (and only when all threads acknowledge the signal, do_setrlimit is executed):

    r = -__syscall(SYS_tgkill, pid, tid, SIGSYNCCALL);
    

    However, the SIGSYNCCALL signal is internal to musl and not handled by gdb. gdb handles all signal types explicitly, but SIGSYNCCALL is not included in the handled signals (see gdb's signals.c). Therefore, when the signal is raised, gdb terminates with the Unknown signal error.

    Workaround:

    The workaround is disabling the call to setrlimit in OpenJDK on-the-fly. The relevant code is in the os::init_2 function (os_linux.cpp):

      if (MaxFDLimit) {
        // set the number of file descriptors to max. print out error
        // if getrlimit/setrlimit fails but continue regardless.
        struct rlimit nbr_files;
        int status = getrlimit(RLIMIT_NOFILE, &nbr_files);
        if (status != 0) {
          if (PrintMiscellaneous && (Verbose || WizardMode))
            perror("os::init_2 getrlimit failed");
        } else {
          nbr_files.rlim_cur = nbr_files.rlim_max;
          status = setrlimit(RLIMIT_NOFILE, &nbr_files);
          if (status != 0) {
            if (PrintMiscellaneous && (Verbose || WizardMode))
              perror("os::init_2 setrlimit failed");
          }
        }
      }
    

    By setting MaxFDLimit to 0, the code above is not executed, and VM initialization could continue normally. There's a command line option for toggling this variable, -XX:-MaxFDLimit, but it is available on Solaris only, so we have no choice but turning off this variable manually inside gdb.

    The reason behind MaxFDLimit is historical, and was meant for increasing the file descriptors default limit on ancient systems which had a very low default FD limit (256), as described in JDK-8010126. Alpine V3.8 has a default limit of 1024, so it should be safe to disable this code - and if needed, limits could be increased with ulimit, rather than by the JVM itself.