I'm trying to execute using OpenMP
(in parallel) a number of tasks that I stored in array called tasklist_GRAD
.
In order to do that, here is the code I implemented:
subroutine master_worker_execution(self,var,tasklist_GRAD,first_task,last_task)
type(tcb),dimension(20),intent(inout)::tasklist_GRAD !< the master array of tasks
integer::i_task !< the task counter
type(tcb)::self !< self
integer,intent(in)::first_task,last_task
type(variables),intent(inout)::var !< the variables
!OpenMP variables
integer::num_thread !< the rank of the thread
integer::nthreads !< the number of threads
integer:: OMP_GET_THREAD_NUM !< function to get the rank of the thread
integer::OMP_GET_NUM_THREADS !< function to get the number of threads
!=======================================================================================================================================================
!$OMP PARALLEL PRIVATE(num_thread,nthreads,i_task) &
!$OMP SHARED(tasklist_GRAD,self,var)
num_thread=OMP_GET_THREAD_NUM() !< le rang du thread
nthreads=OMP_GET_NUM_THREADS() !< le nombre de threads
!$OMP SINGLE
do i_task=first_task,last_task
tasklist_GRAD(i_task)%state=STATE_RUNNING
end do
!$OMP TASK UNTIED
do i_task=first_task,last_task
!$OMP TASK FIRSTPRIVATE(i_task) SHARED(tasklist_GRAD,self,var)
call tasklist_GRAD(i_task)%f_ptr(self,var)
!$OMP END TASK
!$OMP TASKWAIT !< comment this to compare between the first and the second code
end do
!$OMP END TASK
do i_task=first_task,last_task
tasklist_GRAD(i_task)%state=STATE_INACTIVE
end do
!$OMP END SINGLE
!$OMP END PARALLEL
end subroutine master_worker_execution
end module master_worker
While implementing the first code, I discovered !$OMP TASKLOOP
and then programmed the following code:
!=======================================================================================================================================================
!$OMP PARALLEL PRIVATE(num_thread,nthreads,i_task) &
!$OMP SHARED(tasklist_GRAD,self,var)
num_thread=OMP_GET_THREAD_NUM() !< le rang du thread
nthreads=OMP_GET_NUM_THREADS() !< le nombre de threads
!$OMP SINGLE
!$OMP TASKLOOP PRIVATE(i_task) SHARED(tasklist_GRAD,self,var) NUM_TASKS(last_task-first_task+1)
do i_task=first_task,last_task
call tasklist_GRAD(i_task)%f_ptr(self,var)
end do
!$OMP END TASKLOOP
!$OMP END SINGLE
!$OMP END PARALLEL
I have 3 questions (if you consider that I shouldn't ask the second and the third one then I can post 2 new questions).
What is the difference between the first and the second code? For me, it's the same thing (I think adding grainsize(1)
is useless here because I precised the number of tasks).
What happens if I retire the !$OMP TASKWAIT
in the first code? What is the difference between the code with and without the TASKWAIT
construct?
Is it necessary to have a very big number of tasks in order to use Tasking construct properly?
The main difference between the two approaches is that where task synchronization points are and which tasks are synchronised. When your program encounters a task construct, the task is generated, but not necessarily executed immediately (in tasking terminology its execution is deferred). Tasks are only guaranteed to be completed at program exit and at one of the following three constructs:
single
construct and at the end of parallel
region. In the barrier, all explicit tasks generated by the team must be executed to completion. (Note that to write efficient code it is worth to know where the implied barriers are, so you can use nowait
clause to skip them if the logic of your program allows it.)taskgroup
region specifies a wait on the completion of child tasks created in the taskgroup set, and their descendants. This guarantees that also all descendant tasks are completed.As you can see taskwait
and taskgroup
can be quite different in this respect if there are descendant tasks or if there are tasks which were created before the taskgroup
construct.
The taskloop
construct has been added in OpenMP 4.5 and combines the ease of
use of the parallel loop with the flexibility of tasking. In the specification you can read the following about taskloop
:
By default, the taskloop construct executes as if it was enclosed in a taskgroup construct with no statements or directives outside of the taskloop construct. Thus, the taskloop construct creates an implicit taskgroup region. If the nogroup clause is present, no implicit taskgroup region is created.
It means that your second code:
!$OMP TASKLOOP PRIVATE(i_task) SHARED(tasklist_GRAD,self,var) NUM_TASKS(last_task-first_task+1)
do i_task=first_task,last_task
call tasklist_GRAD(i_task)%f_ptr(self,var)
end do
!$OMP END TASKLOOP
has an implicit !$OMP END TASKGROUP
clause at the end of taskloop
(i.e a wait on completion of child tasks of the current task and their descendent tasks.)
On the other hand your first code
do i_task=first_task,last_task
!$OMP TASK FIRSTPRIVATE(i_task) SHARED(tasklist_GRAD,self,var)
call tasklist_GRAD(i_task)%f_ptr(self,var)
!$OMP END TASK
!$OMP TASKWAIT !< comment this to compare between the first and the second code
end do
is quite different, because when a task is created !$OMP TASKWAIT
will wait for its completion, but does not wait for any descendant tasks. The next task is scheduled only when the child task is completed. It practically means that if there is no descendant tasks (i.e no other task is created in call tasklist_GRAD(i_task)%f_ptr(self,var)
) your program runs serially not concurrently. So, the !$OMP TASKWAIT
should be placed after the end do
:
do i_task=first_task,last_task
!$OMP TASK FIRSTPRIVATE(i_task) SHARED(tasklist_GRAD,self,var)
call tasklist_GRAD(i_task)%f_ptr(self,var)
!$OMP END TASK
end do
!$OMP TASKWAIT !< comment this to compare between the first and the second code
In this case first all the tasks are created, then !$OMP TASKWAIT
waits for their completition. If there are no descendant tasks, it does the same thing as your second code using taskloop
.
Note that !$OMP TASK UNTIED
and the corresponding !$OMP END TASK
should be deleted in your first code, it is not necessary...
Answering your second question, if you delete !$OMP TASKWAIT
in your first code, the task synchronization point will be !$OMP END SINGLE
, so tasklist_GRAD(i_task)%state=STATE_INACTIVE
may be encountered before tasks are finished. My guess is that it is not your intention.
Answering your third question: Tasking works properly regardless of the number of tasks created. The only question is efficiency, if there are too few tasks it can cause load imbalance, if there are too many tasks it can cause overhead, but it is system/implementation dependent.