Search code examples
parallel-processingstaticopenmpschedulingparallel.for

OpenMP parallel "for" with "static" schedule


I've a confusion or maybe a misunderstanding of the parallel for behavior with a static schedule and default chunk size.

For example the below picture What I excepted to have is yes the master thread will take an extra iteration but I excepted it would be at index 8 not 2!

The static schedule algorithm with default chunk size applies the round robin on the (#iterations / #threads) with 2 cases

  1. If the #iterations is divisible by #threads like N=8 and #threads = 4. each thread will take an equal amount of iterations in round-robin fashion (straight forward case)

  2. If the #iterations is not divisible by #threads. It will calculate the nearest integer of iterations divided by #threads and do the same as above

case of N=9 --> 8 it will divide 2 2 2 2 and 1

case of N=11 --> 12 it will be divided 3 3 3 and 2

threads are 0 1 2 3

enter image description here


Solution

  • When you use static scheduling, the OpenMP implementation will have to ensure that all iterations are computed by some thread if the number of threads does not evenly divide the number iterations.

    From a load balancing perspective the compiler will try to allocate roughly the same number of iterations to each thread and to avoid that one thread receives all remaining iterations that are in excess of the division. So, in your example with N=11 and four threads, the remainder will be 3 and the first three threads 0..2 will get one extra iteration instead of assign 3 extra iterations to the last thread.