Suppose we have a variable var=100
. The clause private(var)
creates n additional variables, assigning one to each of the n threads:
Before parallelism, var's value and address are 100, 0x7ffd683992bc
private parallel region:
var's value and address in thread 3 are 3, 0x7feb115ffde8
var's value and address in thread 0 are 0, 0x7ffd68399258
var's value and address in thread 1 are 1, 0x7feb129ffde8
var's value and address in thread 2 are 2, 0x7feb11fffde8
After private parallelism, var's value and address are 100, 0x7ffd683992bc
This works exactly as intended. However, creating a threadprivate
region, we notice:
Before parallelism, var's value and address are 100 and 0x7f685615c7bc
threadprivate parallel region:
var's value and address in thread 0 are 0 and 0x7f685615c7bc
var's value and address in thread 3 are 6 and 0x7f6854c006bc
var's value and address in thread 1 are 2 and 0x7f68560006bc
var's value and address in thread 2 are 4 and 0x7f68556006bc
After first tp parallelism, var's value and address are 0 and 0x7f685615c7bc
As you can see, thread 0
's copy shares the same address as the original variable and changes here reflect out of the parallel region as well.
All of private()
, firstprivate()
, and lastprivate()
spawn n additional variables, while threadprivate()
spawns only n-1 more. What is the reason for this behavior and why is it intended to work this way?
threadprivate
is used on static or global variables to give one copy per thread with global extent (so the variable continues to exist for the life of the thread), whereas {first|last|}private
declare that a variable should be allocated locally in each thread on entry to the parallel region, and then destroyed on exit from the parallel region.
OpenMP specifies that thread zero inside a parallel region is the thread that executed the parallel
directive (OpenMP Standard paragraph 9: "When any thread encounters a parallel construct, the thread creates a team of itself and zero or more additional threads and becomes the primary thread of the new team."), therefore it will clearly continue to use the existing, already allocated, threadprivate
variable for that pre-existing thread.
The behaviour that you are seeing is therefore exactly what you should expect. A threadprivate
variable exists for the whole of the thread's lifetime, and a new instance will not be created for the, pre-existing, thread zero inside a new parallel region.