Search code examples
clinuxforkopenmplibgomp

Unable to set the OpenMP threads affinity in a forked process


I am trying to run two processes on separate CPUs using openMP. In this case each CPU has 6 cores with hyper-threading (so 12 hardware threads). They need to do some synchronization which seems some what easier if they know each other's PID. So I am starting a process sigC from sigS using a fork() and execve() called with a different value for the GOMP_CPU_AFFINITY environment variable. After the fork()/execve() call, sigS has the correct affinity still but sigC prints

libgomp: no cpus left for affinity setting

and all threads are on the same core.

The code of sigS:

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <omp.h>
#include <sched.h>

int main( void )
{
   omp_set_num_threads(12); //12 hardware threads per CPU
   //this loop runs as expected
   #pragma omp parallel for
   for( int i = 0; i<12; i++ ) {
      #pragma omp critical 
      {
         printf("TEST PRE-FORK: I am thread %2d running on core %d\n",
                omp_get_thread_num(), sched_getcpu());
      }
   }

   pid_t childpid = fork();

   if( childpid < 0 ) {
      perror("Fork failed");
   } else {
      if( childpid == 0 ) { //<------ attempt to set affinity for child
         //change the affinity for the other process so it runs
         //on the other cpu
         char ompEnv[] = "GOMP_CPU_AFFINITY=6-11 18-23"; 
         char * const args[]    = { "./sigC", (char*)0 };
         char * const envArgs[] = { ompEnv,   (char*)0 };
         execve(args[0], args, envArgs);
         perror("Returned from execve");
         exit(1);
      } else {
         omp_set_num_threads(12);
         printf("PARENT: my pid     = %d\n", getpid());
         printf("PARENT: child pid  = %d\n", childpid);
         sleep(5); //sleep for a bit so child process prints first

         //This loop gives the same thread core/pairings as above
         //this is expected
         #pragma omp parallel for
         for( int i = 0; i < 12; i++ ) {
            #pragma omp critical
            {
               printf("PARENT: I'm thread %2d, on core %d.\n",
                      omp_get_thread_num(), sched_getcpu());
            }
         }
      }
   }
   return 0;
}

The code of sigC just has a omp parallel for loop in it but for completeness:

#define _GNU_SOURCE
#include <stdio.h>
#include <unistd.h>
#include <errno.h>
#include <omp.h>
#include <sched.h>

int main( void )
{
   omp_set_num_threads(12);
   printf("CHILD: my pid     = %d\n", getpid());
   printf("CHILD: parent pid = %d\n", getppid());
   //I expect this loop to have the core pairings as I specified in execve
   //i.e thread 0 -> core 6, 1 -> 7, ... 6 -> 18, 7 -> 19 ... 11 -> 23
   #pragma omp parallel for
   for( int i = 0; i < 12; i++ ) {
      #pragma omp critical
      {
         printf("CHILD: I'm thread %2d, on core %d.\n",
                omp_get_thread_num(), sched_getcpu());
      }
   }
   return 0;
}

The output:

$ env GOMP_CPU_AFFINITY="0-5 12-17" ./sigS

This part is as expected

TEST PRE-FORK: I'm thread  0, on core 0.
TEST PRE-FORK: I'm thread 11, on core 17.
TEST PRE-FORK: I'm thread  5, on core 5.
TEST PRE-FORK: I'm thread  6, on core 12.
TEST PRE-FORK: I'm thread  3, on core 3.
TEST PRE-FORK: I'm thread  1, on core 1.
TEST PRE-FORK: I'm thread  8, on core 14.
TEST PRE-FORK: I'm thread 10, on core 16.
TEST PRE-FORK: I'm thread  7, on core 13.
TEST PRE-FORK: I'm thread  2, on core 2.
TEST PRE-FORK: I'm thread  4, on core 4.
TEST PRE-FORK: I'm thread  9, on core 15.
PARENT: my pid     = 11009
PARENT: child pid  = 11021

This is the problem - all threads in the child run on core 0

libgomp: no CPUs left for affinity setting
CHILD: my pid     = 11021
CHILD: parent pid = 11009
CHILD: I'm thread  1, on core 0.
CHILD: I'm thread  0, on core 0.
CHILD: I'm thread  4, on core 0.
CHILD: I'm thread  5, on core 0.
CHILD: I'm thread  6, on core 0.
CHILD: I'm thread  7, on core 0.
CHILD: I'm thread  8, on core 0.
CHILD: I'm thread  9, on core 0.
CHILD: I'm thread 10, on core 0.
CHILD: I'm thread 11, on core 0.
CHILD: I'm thread  3, on core 0.

(I omitted the parent thread printing as it is the same as the pre-fork)

Any ideas on how I can fix this or if it is the right approach?


Solution

  • The fork()-ed child process inherits its parent affinity mask. libgomp intersects this affinity mask with the set from GOMP_CPU_AFFINITY and ends up with an empty set as both sets are complementary. This behaviour is not documented, but a look at the source code of libgomp confirms that this is indeed the case.

    The solution is to reset the affinity mask of the child process before it makes the execve() call:

    if (childpid == 0) { //<------ attempt to set affinity for child
       cpu_set_t *mask;
       size_t size;
       int nrcpus = 256; // 256 CPUs should be more than enough
    
       // Reset the CPU affinity mask
       mask = CPU_ALLOC(nrcpus);
       size = CPU_ALLOC_SIZE(nrcpus);
       for (int i = 0; i < nrcpus; i++)
          CPU_SET_S(i, size, mask);
       if (sched_setaffinity(0, size, mask) == -1) { handle error }
       CPU_FREE(mask);
    
       //change the affinity for the other process so it runs
       //on the other cpu
       char ompEnv[] ="GOMP_CPU_AFFINITY=6-11 18-23"; 
       char * const args[]    = {"./sigC", (char*)0};
       char * const envArgs[] = {ompEnv,   (char*)0};
       execve(args[0], args, envArgs);
       perror("Returned from execve");
       exit(1);
    } else {