Search code examples
linuxunixsignalschild-process

Does kill(SIGSTOP) take effect by the time kill() returns?


Suppose I have a parent process and a child process (started with e.g. fork() or clone()) running on Linux. Further suppose that there is some shared memory that both the parent and the child can modify.

Within the context of the parent process, I would like to stop the child process and know that it has actually stopped, and moreover that any shared memory writes made by the child are visible to the parent (including whatever synchronization or cache flushes that may require in a multi-processor system).

This answer, which speaks of using kill(SIGSTOP) to stop a child process, contains an interesting tidbit:

When the first kill() call succeeds, you can safely assume that the child has stopped.

Is this statement actually true, and if so, can anyone expound on it, or point me to some more detailed documentation (e.g. a Linux manpage)? Otherwise, is there another mechanism that I can use to ensure that the child process is completely stopped and is not going to be doing any more writes to the shared memory?

I'm imagining something along the lines of:

  1. the parent sends a different signal (e.g. SIGUSR1), which the child can handle
  2. the child handles the SIGUSR1 and does something like a pthread_cond_wait() in the signal handler to safely "stop" (though still running from the kernel perspective) -- this is not fully fleshed out in my mind yet, just an idea

I'd like to avoid reinventing the wheel if there's already an established solution to this problem. Note that the child process needs to be stopped preemptively; adding some kind of active polling to the child process is not an option in this case.

If it only existed on Linux, pthread_suspend() would be perfect ...


Solution

  • The POSIX documentation on Signal Concepts strongly suggests, but does not explicitly say, that the targeted process will be STOPped by the time kill() returns:

    A signal is said to be "generated" for (or sent to) a process or thread when the event that causes the signal first occurs... Examples of such events include ... invocations of the kill() and sigqueue() functions.

    The documentation is at pains to distinguish signal generation from delivery (when the signal action takes effect) or acceptance. Unfortunately, it sometimes mentions actions taken in response to a stop signal upon generation, and sometimes upon delivery. Given that something must happen upon generation per se, I'd agree that the target process must be STOPped by the time your call returns.

    However, at the cost of another syscall, you can be sure. Since you have a parent/child relationship in your design, you can waitpid()/WUNTRACED to receive notification that your child process has, indeed, STOPped.

    Edit

    See the other answer from that other guy [sic] for reasons why you might not want to do this.