How is synchronization between two taskloop
constructs done? Specifically, in the following pseudo code, if there are more threads available than the number of tasks of the first loop, I believe these free threads are spinning at the implicit barrier at the end of the single construct. Now are these free threads allowed to start executing the second loop concurrently making it unsafe to parallelize things this way (due to the inter-dependency on the array A
)?
!$omp parallel
!$omp single
!$omp taskloop num_tasks(10)
DO i=1, 10
A(i) = foo()
END DO
!$omp end taskloop
!do other stuff
!$omp taskloop
DO j=1, 10
B(j) = A(j)
END DO
!$omp end taskloop
!$omp end single
!$omp end parallel
I haven't been able to find a clear answer from the API specification: https://www.openmp.org/spec-html/5.0/openmpsu47.html#x71-2080002.10.2
The taskloop
construct by default has an implicit taskgroup
around it. With that in mind, what happens for your code is that the single
constructs picks any one thread out of the available threads of the parallel team (I'll call that the producer thread). The n-1 other threads are then send straight to the barrier of the single
construct and ware waiting for work to arrive (the tasks).
Now with the taskgroup
what happens is that producer thread kicks off the creation of the loop tasks, but then waits at the end of the taskloop
construct for all the created tasks to finish:
!$omp parallel
!$omp single
!$omp taskloop num_tasks(10)
DO i=1, 10
A(i) = foo()
END DO
!$omp end taskloop ! producer waits here for all loop tasks to finish
!do other stuff
!$omp taskloop
DO j=1, 10
B(j) = A(j)
END DO
!$omp end taskloop ! producer waits here for all loop tasks to finish
!$omp end single
!$omp end parallel
So, if you have less parallelism (= number of tasks created by the first taskloop
) than the n-1 worker threads in the barrier, then some of these threads will idle.
If you want more overlap and if the "other stuff" is independent of the first taskloop
, then you can do this:
!$omp parallel
!$omp single
!$omp taskgroup
!$omp taskloop num_tasks(10) nogroup
DO i=1, 10
A(i) = foo()
END DO
!$omp end taskloop ! producer will not wait for the loop tasks to complete
!do other stuff
!$omp end taskgroup ! wait for the loop tasks (and their descendant tasks)
!$omp taskloop
DO j=1, 10
B(j) = A(j)
END DO
!$omp end taskloop
!$omp end single
!$omp end parallel
Alas, the OpenMP API as of version 5.1 does not support task dependences for the taskloop construct, so you cannot easily describe the dependency between the loop iterations of the first taskloop
and the second taskloop
. The OpenMP language committee is working on this right now, but I do not see this being implemented for the OpenMP API version 5.2, but rather for version 6.0.
PS (EDIT): For the second taskloop
as it's right before the end of the single
construct and thus right before a barrier, you can easily add the nogroup
there as well to avoid that extra bit of waiting for the producer thread.