Search code examples
multithreadingpthreadssignalsdeadlocksignal-handling

Deadlock while multi-threaded process exit in signal handler


There are two thread in a process. when main thread receive SEGV, from signal handler i used to send some internal signal to other auxiliary thread using pthread_kill and using this internal signal i used to trap auxiliary thread in sleep state, so that i can now do mandatory cleanup and stack-trace dump into file from main thread with thinking of now single threaded process, (as other auxiliary thread is in sleep state).

But, once i encounter that while main thread is exiting, process left (doesn't exit)and seems present in deadlock state between two thread.

Please help me why and which part of code is causing deadlock.

Thanks in Advance!!

Auxiliary Thread stack:

Thread 2 (Thread 0x7fc565b5b700 (LWP 13831)):
#0  0x00007fc5668e81fd in nanosleep () from /lib64/libc.so.6
#1  0x00007fc566915214 in usleep () from /lib64/libc.so.6
#2  0x00000000009699a2 in SignalHandFun() at ...........
#3  <signal handler called>
#4  0x00007fc56691820a in mmap64 () from /lib64/libc.so.6
#5  0x00007fc5668a5bfc in _IO_file_doallocate_internal () from /lib64/libc.so.6
#6  0x00007fc5668b386c in _IO_doallocbuf_internal () from /lib64/libc.so.6
#7  0x00007fc5668b215b in _IO_new_file_underflow () from /lib64/libc.so.6
#8  0x00007fc5668b38ae in _IO_default_uflow_internal () from /lib64/libc.so.6
#9  0x00007fc566894bad in _IO_vfscanf_internal () from /lib64/libc.so.6
#10 0x00007fc5668a2cd8 in fscanf () from /lib64/libc.so.6
..... 
......
.....
#15 0x00007fc567259806 in start_thread () from /lib64/libpthread.so.0
#16 0x00007fc56691b64d in clone () from /lib64/libc.so.6
#17 0x0000000000000000 in ?? ()

Main Thread stack:

Thread 1 (Thread 0x7fc5679c0720 (LWP 13795)):
#0  0x00007fc56692878e in __lll_lock_wait_private () from /lib64/libc.so.6
#1  0x00007fc5668b504b in _L_lock_1309 () from /lib64/libc.so.6
#2  0x00007fc5668b3d9a in _IO_flush_all_lockp () from /lib64/libc.so.6
#3  0x00007fc5668b4181 in _IO_cleanup () from /lib64/libc.so.6
#4  0x00007fc566872630 in __run_exit_handlers () from /lib64/libc.so.6
#5  0x00007fc5668726b5 in exit () from /lib64/libc.so.6
#6  0x00000000009698e3 in SignalHandFun() at ....
#7  <signal handler called>
#8  0x000000b1000000b0 in ?? ()
#9  0x0000000000000000 in ?? ()

Solution

  • I assume that you send a signal to another thread because you want to do some work that cannot be done with async-signal-safe functions.

    The problem is that if your signal handler is called on a thread that has any locks acquired (such as in your case, the internal libio list lock), then any thread that attempts to acquire the same lock will block indefinitely: You cannot return from your SIGSEGV handler, so the lock will never become available for locking again, and no thread waiting on the lock will make progress. In your case, the exit function needs to acquire the libio list lock because it has to go through the list of all open file streams and flush them, while a thread opening a new file acquires the lock while it puts the new file on the list.

    While this is an implementation detail and could conceivable be addressed inside glibc at some (far) point in the future (the small improvements we have made relatively recently will not help in your case), the only way is that you call _exit before the final process exit procedure in glibc, after the cleanup you need to do. In your case, it may be possible to do so from an atexit handler you registered as early possible, but this depends on your application.

    Regarding crash handlers, we published some advice here:

    The article focuses on fork, but the deadlock issues are pretty much the same in your case.