Search code examples
clinuxposixpidfork

Race condition with setpgid


While working on a program for my OS class, I discovered an interesting case involving a what appears to be a race condition involving setpgid.

Compile each program below separately. After executing ./test 3 (or any number > 2), ps jx will show that all of the infy processes have been placed in the same group. ./test 2 will present an error that setpgid has failed attempting to move the last process. Uncommenting the "fix me" line will cause ./test 2 to work as expected.

Can anyone offer an explanation or solution?

// test.c
#include <unistd.h>
#include <stdio.h>
#include <stdlib.h>

char* args[] = {
  "./infy",
  NULL
};

int main(int argc, char* argv[])
{
  if (argc != 2)
  {
    fprintf(stderr, "Usage: %s [num]\n", argv[0]);
    return 1;
  }
  int num = strtol(argv[1], NULL, 10);
  if (num < 2)
  {
    fprintf(stderr, "Invalid number of processes\n");
    return 1;
  }

  pid_t pid = fork();
  if (pid > 0)
  {
    int s;
    waitpid(pid, &s, 0);
    fprintf(stderr, "Children done\n");
  }
  else
  {
    pid_t pgid = -1;
    int i;
    for (i = 1; i < num; i++)
    {
      pid_t pid2 = fork();
      if (pid2 > 0)
      {
        if (pgid == -1)
        {
          pgid = pid2;
        }
      }
      else
      {
        if (setpgid(0, pgid == -1 ? 0 : pgid) != 0)
        {
          perror("setpgid failed in non-last process");
        }
        execve(args[0], args, NULL);
        perror("exec failed");
        exit(1);
      }
    }

    // uncomment me to fix
    //fprintf(stderr, "pgid %d\n", pgid);
    if (setpgid(0, pgid) != 0)
    {
      perror("setpgid failed in last process");
    }
    execve(args[0], args, NULL);
    perror("exec failed");
    exit(1);
  }
} 

Where "infy" is a separate program:

// infy.c
#include <unistd.h>

int main()
{
  while (1)
  {
    sleep(1);
  }
} 

Solution

  • I figured it out, finally. When setpgid failed, errno was set to EPERM. One of the possible errors on the man page for EPERM is:

    The value of the pgid argument is valid but does not match the process ID of the process indicated by the pid argument and there is no process with a process group ID that matches the value of the pgid argument in the same session as the calling process.

    The race condition in this case is whether the child process can set its pgid before the parent does. If the child wins the race, all is well. If the parent wins the race, the process group it is attempting to set doesn't exist yet, and setpgid fails.

    The solution is for the parent process to set the child's group id immediately after the first fork, by calling setpgid(pid2, pid2) in the if (pgid == -1) block.

    Also relevant, from the man page:

    To provide tighter security, setpgid() only allows the calling process to join a process group already in use inside its session or create a new process group whose process group ID was equal to its process ID.