Search code examples
linuxdockercgroups

What happens to other processes when a Docker container's PID1 exits?


Consider the following, which runs sleep 60 in the background and then exits:

$ cat run.sh 
sleep 60&
ps
echo Goodbye!!!
$ docker run --rm -v $(pwd)/run.sh:/run.sh ubuntu:16.04 bash /run.sh
  PID TTY          TIME CMD
    1 ?        00:00:00 bash
    5 ?        00:00:00 sleep
    6 ?        00:00:00 ps
Goodbye!!!

This will start a Docker container, with bash as PID1. It then fork/execs a sleep process, and then bash exits. When the Docker container dies, the sleep process somehow dies too.

My question is: what is the mechanism by which the sleep process is killed? I tried trapping SIGTERM in a child process, and that appears to not get tripped. My presumption is that something (either Docker or the Linux kernel) is sending SIGKILL when shutting down the cgroup the container is using, but I've found no documentation anywhere clarifying this.

EDIT The closest I've come to an explanation is the following quote from baseimage-docker:

If your init process is your app, then it'll probably only shut down itself, not all the other processes in the container. The kernel will then forcefully kill those other processes, not giving them a chance to gracefully shut down, potentially resulting in file corruption, stale temporary files, etc. You really want to shut down all your processes gracefully.

So at least according to this, the implication is that when the container exits, the kernel will sending a SIGKILL to all remaining processes. But I'd still like clarity on how it decides to do that (i.e., is it a feature of cgroups?), and ideally a more authoritative source would be nice.


Solution

  • OK, I seem to have come up with some more solid evidence that this is, in fact, the Linux kernel doing the terminating. In the clone(2) man page, there's this useful section:

    CLONE_NEWPID (since Linux 2.6.24)

    The first process created in a new namespace (i.e., the process created using the CLONE_NEWPID flag) has the PID 1, and is the "init" process for the namespace. Children that are orphaned within the namespace will be reparented to this process rather than init(8). Unlike the traditional init process, the "init" process of a PID namespace can terminate, and if it does, all of the processes in the namespace are terminated.

    Unfortunately this is still vague on how exactly the processes in the namespace are terminated, but perhaps that's because, unlike a normal process exit, no entry is left in the process table. Whatever the case is, it seems clear that:

    • The kernel itself is killing the other processes
    • They are not killed in a way that allows them any chance to do cleanup, making it (almost?) identical to a SIGKILL