Is fork (supposed to be) safe from signal handlers in a threaded program?

I'm really uncertain about the requirements POSIX places on the safety of fork in the presence of threads and signals. fork is listed as one of the async-signal-safe functions, but if there is a possibility that library code has registered pthread_atfork handlers which are not async-signal-safe, does this negate the safety of fork? Does the answer depend on whether the thread in which the signal handler is running could be in the middle of using a resource that the atfork handlers need? Or said a different way, if the atfork handlers make use of synchronization resources (mutexes, etc.) but fork is being called from a signal handler which executed in a thread that never accesses these resources, is the program conforming?

Building on this question, if "thread-safe" forking is implemented internally in the system library using the idioms suggested by pthread_atfork (obtain all locks in the prefork handler and release all locks in both the parent and child postfork handlers), then is fork ever safe to use from signal handlers in a threaded program? Isn't it possible that the thread handling the signal could be in the middle of a call to malloc or fopen/fclose and holding a global lock, resulting in deadlock during fork?

Finally, even if fork is safe in signal handlers, is it safe to fork in a signal handler and then return from the signal handler, or does a call to fork in a signal handler always necessitate a subsequent call to _exit or one of the exec family of functions before the signal handler returns?

Solution

Trying my best to answer all the sub-questions; I apologise that some of this is vaguer than it ideally should be:

If there is a possibility that library code has registered pthread_atfork handlers which are not async-signal-safe, does this negate the safety of fork?

Yes. The fork documentation explicitly mentions this:

   When the application calls fork() from a signal handler and any of the
   fork handlers registered by pthread_atfork() calls a function that is
   not asynch-signal-safe, the behavior is undefined.

Of course, this means you can't actually use pthread_atfork() for its intended purpose of making multi-threaded libraries transparent to processes that believe they are single-threaded, because none of the pthread synchronisation functions are async-signal-safe; this is noted as a defect in the spec, see http://www.opengroup.org/austin/aardvark/latest/xshbug3.txt (search for "L16723").

Does the answer depend on whether the thread in which the signal handler is running could be in the middle of using a resource that the atfork handlers need? Or said a different way, if the atfork handlers make use of synchronization resources (mutexes, etc.) but fork is being called from a signal handler which executed in a thread that never accesses these resources, is the program conforming?

Strictly speaking the answer is no, because the according to the spec, functions are either async-signal-safe or they're not; there's no concept of "safe under certain circumstances". In practice you might well get away with it, but you would be vulnerable to a clunky-but-correct implementation that didn't partition its resources in the way you were expecting.

Building on this question, if "thread-safe" forking is implemented internally in the system library using the idioms suggested by pthread_atfork (obtain all locks in the prefork handler and release all locks in both the parent and child postfork handlers), then is fork ever safe to use from signal handlers in a threaded program? Isn't it possible that the thread handling the signal could be in the middle of a call to malloc or fopen/fclose and holding a global lock, resulting in deadlock during fork?

If it were implemented in that way, then you're right, fork() from a signal handler would never be safe, because attempting to obtain a lock might deadlock if the calling thread already held it. But this implies that an implementation using such a method would not be conforming.

Looking at glibc as one example, it doesn't do that - rather, it takes two approaches: firstly, the locks that it does obtain are recursive (so if the current thread already has them, their lock count will simply be increased); further, in the child process, it simply unilaterally overwrites all the locks - see this extract from nptl/sysdeps/unix/sysv/linux/fork.c:

  /* Reset the file list.  These are recursive mutexes.  */
  fresetlockfiles ();

  /* Reset locks in the I/O code.  */
  _IO_list_resetlock ();

  /* Reset the lock the dynamic loader uses to protect its data.  */
  __rtld_lock_initialize (GL(dl_load_lock));

where each of the resetlock and lock_initialize functions ultimately call glibc's internal equivalent of pthread_mutex_init(), effectively resetting the mutex regardless of any waiters.

I think the theory is that, by obtaining the (recursive) lock it's guaranteed that no other threads will be touching the data structures (at least in a way that might cause a crash), and then resetting the individual locks ensures the resources aren't permanently blocked. (Resetting the current thread's lock is safe since there are now no other threads to contend for the data structure, and indeed won't be until whatever function is using the lock has returned).

I'm not 100% convinced that this covers all eventualities (not least because if/when the signal handler returns, the function that's just had its lock stolen will try to unlock it, and the internal recursive unlock function doesn't protect against unlocking too many times!) - but it seems that a workable scheme could be built on top of async-signal-safe recursive locks.

Finally, even if fork is safe in signal handlers, is it safe to fork in a signal handler and then return from the signal handler, or does a call to fork in a signal handler always necessitate a subsequent call to _exit or one of the exec family of functions before the signal handler returns?

I assume you're talking about the child process? (If fork() being async-signal-safe means anything then it should be possible to return in the parent!)

Not having found anything in the spec that states otherwise (though I may have missed it) I believe it should be safe - at least, 'safe' in the sense that returning from the signal handler in the child doesn't imply undefined behaviour in and of itself, though the fact that a multi-threaded process has just forked may imply that an exec*() or _exit() is probably the safest course of action.