Search code examples
linuxdockercontainers

environment independence example in Docker


I am reading Docker in Action, and I have a doubt. I was reading about containers created without their own PID namespace. I am not able to understand the following paragraph:

Because containers all have their own PID namespace, they both cannot gain meaningful insight from examining it, and can take more static dependencies on it. Suppose a container runs two processes: a server and a local process monitor. That monitor could take a hard dependency on the server’s expected PID and use that to monitor and control the server. This is an example of environment independence

I am not sure but I think static dependencies means hard coded dependencies. But can we expect processes to get same PID when they are started? also why is it called environment independence? I think it should be called environment dependence.

This is a theoretical concept related to containerization. I don't know what to try.


Solution

  • Generally in Unix process IDs are assigned sequentially, starting from 1. Since each container has its own pid namespace, the main container process is process 1 (with the additional responsibilities that entails) and any other processes that get created will have "small" pids after that. If you're not using docker exec to create additional processes in the container, then the only processes that will exist in the pid namespace will be the main process and anything else it starts.

    Let's consider this slightly contrived example:

    #include <stdio.h>
    #include <sys/types.h>
    #include <sys/wait.h>
    #include <unistd.h>
    
    int main(int argc, char **argv) {
      int child;
    
      printf("My pid is %d\n", getpid());
      child = fork();
      if (child < 0) {
        perror("fork");
        return 1; /* error on fork */
      } else if (child == 0) {
        return 0; /* we are the child */
      }
      /* else we are the parent */
      printf("Child pid is %d\n", child);
      waitpid(child, NULL, 0);
      return 0;
    }
    

    If you compile this into an image and run it, it will always print out

    My pid is 1
    Child pid is 2
    

    unless you run it with docker run --init, in which case the pids will be 2 and 3.

    This is where your article suggests, in principle, it's possible to know the pids before you fork(2) or otherwise create the subprocesses. It does involve knowing whether or not you're running in a container, and whether or not there's an additional init process, and that nothing has docker exec another process while you're setting up. But "usually" the process IDs will have consistent values.

    In practice, these sort of hard-coded magic values can be pretty fragile, and I'd avoid depending on a specific process having a specific process ID. Since you'll need to do things like error-checking anyways it doesn't really cost you anything to save the child pid. It will also be much easier to develop your code if you can take a normal process that runs in any Unix-like environment and package it into a container, rather than building something that only runs in Docker.

    I have documented a debugging procedure for some things I work on professionally that involves knowing that the "real" process in the container, after startup scripts and wrappers and what not, has a process ID of 8 or 9. So if you need to use docker exec to debug something inside the container, and it doesn't have ps, you'd give that pid to a diagnostic tool. That's something I'd write up in a wiki page but not necessarily write into code.