Search code examples
c++multithreadingopenmp

threadprivate with initialization of function-scope static variables


I am testing out how threadprivate clause works in openMP. I set up the following simple test case

#include <iostream>
#include <omp.h>

void func(){
   static int some_id = omp_get_thread_num();    
#pragma omp threadprivate(some_id)
#pragma omp critical
  {
      std::cout << "id:" << some_id << std::endl;
      std::cout << "in thread " << omp_get_thread_num() << std::endl;
  }   
}

int main() {  
omp_set_num_threads(4);
#pragma omp parallel
    func();
}

I compiled this code with gcc7.2 and ran it and I got

id:1
in thread 1
id:0
in thread 0
id:0
in thread 3
id:0
in thread 2

I know that C++ standard (at least since C++03 standard) guarantees that static variables are initialized by one and only one thread and all other threads are blocked until the initialization is complete. With that said, I expected that the local copy of id in all threads possesses the same value (the id of the thread that took care of initialization). Why is id in thread#1 is different from the one in other threads? Clearly my mental model of threadprivate is flawed. Can you explain what is going on here?


Solution

  • I think this is a bug in the implementation. The OpenMP API specification requires that the some_id variable is initialized by each of the threads:

    Each copy of a block-scope threadprivate variable that has a dynamic initializer is initialized the first time its thread encounters its definition; if its thread does not encounter its definition, its initialization is unspecified.

    (See https://www.openmp.org/spec-html/5.1/openmpsu114.html#x149-1630002.21.2)

    GCC 9.3.0 on my machine has the same behavior:

    $ g++ --version
    g++ (Ubuntu 9.3.0-17ubuntu1~20.04) 9.3.0
    Copyright (C) 2019 Free Software Foundation, Inc.
    This is free software; see the source for copying conditions.  There is NO
    warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
    
    $ g++ -O0 -g -fopenmp -o tp tp.cc
    $ ./tp
    id:0
    in thread 0
    id:0
    in thread 3
    id:0
    in thread 1
    id:0
    in thread 2
    

    While clang 12 does the correct thing:

    $ clang --version
    clang version 12.0.0 (/net/home/micha/projects/llvm/llvm-project/clang d28af7c654d8db0b68c175db5ce212d74fb5e9bc)
    Target: x86_64-unknown-linux-gnu
    Thread model: posix
    InstalledDir: /net/software/x86_64/clang/12.0.0/bin
    $ clang++ -g -O0 -fopenmp -o tp ./tp.cc
    $ ./tp
    id:2
    in thread 2
    id:0
    in thread 0
    id:3
    in thread 3
    id:1
    in thread 1