arrays multithreading fortran openmp hpc

What is the difference between using OMP TASK with a loop outside and OMP TASKLOOP with a loop inside?

I'm trying to execute using OpenMP (in parallel) a number of tasks that I stored in array called tasklist_GRAD. In order to do that, here is the code I implemented: subroutine master_worker_execution(self,var,tasklist_GRAD,first_task,last_task)

type(tcb),dimension(20),intent(inout)::tasklist_GRAD !< the master array of tasks 
integer::i_task !< the task counter 
type(tcb)::self !< self
integer,intent(in)::first_task,last_task 
type(variables),intent(inout)::var !< the variables
!OpenMP variables
integer::num_thread !< the rank of the thread
integer::nthreads !< the number of threads
integer:: OMP_GET_THREAD_NUM !< function to get the rank of the thread
integer::OMP_GET_NUM_THREADS !< function to get the number of threads    

!=======================================================================================================================================================
!$OMP PARALLEL PRIVATE(num_thread,nthreads,i_task) &
!$OMP SHARED(tasklist_GRAD,self,var)
num_thread=OMP_GET_THREAD_NUM() !< le rang du thread 
nthreads=OMP_GET_NUM_THREADS() !< le nombre de threads    
!$OMP SINGLE
do i_task=first_task,last_task
   tasklist_GRAD(i_task)%state=STATE_RUNNING
end do

!$OMP TASK UNTIED
do i_task=first_task,last_task
   !$OMP TASK FIRSTPRIVATE(i_task) SHARED(tasklist_GRAD,self,var)
   call tasklist_GRAD(i_task)%f_ptr(self,var) 
   !$OMP END TASK
   !$OMP TASKWAIT  !< comment this to compare between the first and the second code
end do
!$OMP END TASK 

do i_task=first_task,last_task
   tasklist_GRAD(i_task)%state=STATE_INACTIVE 
end do
!$OMP END SINGLE 
!$OMP END PARALLEL

  end subroutine master_worker_execution
end module master_worker

While implementing the first code, I discovered !$OMP TASKLOOP and then programmed the following code:

!=======================================================================================================================================================
    !$OMP PARALLEL PRIVATE(num_thread,nthreads,i_task) &
    !$OMP SHARED(tasklist_GRAD,self,var)
    num_thread=OMP_GET_THREAD_NUM() !< le rang du thread 
    nthreads=OMP_GET_NUM_THREADS() !< le nombre de threads    
    !$OMP SINGLE
    !$OMP TASKLOOP PRIVATE(i_task) SHARED(tasklist_GRAD,self,var) NUM_TASKS(last_task-first_task+1)
    do i_task=first_task,last_task
       call tasklist_GRAD(i_task)%f_ptr(self,var) 
    end do
    !$OMP END TASKLOOP
    !$OMP END SINGLE
    !$OMP END PARALLEL

I have 3 questions (if you consider that I shouldn't ask the second and the third one then I can post 2 new questions).

What is the difference between the first and the second code? For me, it's the same thing (I think adding grainsize(1) is useless here because I precised the number of tasks).
What happens if I retire the !$OMP TASKWAIT in the first code? What is the difference between the code with and without the TASKWAIT construct?
Is it necessary to have a very big number of tasks in order to use Tasking construct properly?

Solution

The main difference between the two approaches is that where task synchronization points are and which tasks are synchronised. When your program encounters a task construct, the task is generated, but not necessarily executed immediately (in tasking terminology its execution is deferred). Tasks are only guaranteed to be completed at program exit and at one of the following three constructs:

barrier (either implicit or explicit). E.g. there is an implied barrier at the end of the single construct and at the end of parallel region. In the barrier, all explicit tasks generated by the team must be executed to completion. (Note that to write efficient code it is worth to know where the implied barriers are, so you can use nowait clause to skip them if the logic of your program allows it.)
taskwait construct specifies a wait on the completion of child tasks of the current task (not including their descendants).
taskgroup: At the end of taskgroup region specifies a wait on the completion of child tasks created in the taskgroup set, and their descendants. This guarantees that also all descendant tasks are completed.

As you can see taskwait and taskgroup can be quite different in this respect if there are descendant tasks or if there are tasks which were created before the taskgroup construct.

The taskloop construct has been added in OpenMP 4.5 and combines the ease of use of the parallel loop with the flexibility of tasking. In the specification you can read the following about taskloop:

By default, the taskloop construct executes as if it was enclosed in a taskgroup construct with no statements or directives outside of the taskloop construct. Thus, the taskloop construct creates an implicit taskgroup region. If the nogroup clause is present, no implicit taskgroup region is created.

It means that your second code:

!$OMP TASKLOOP PRIVATE(i_task) SHARED(tasklist_GRAD,self,var) NUM_TASKS(last_task-first_task+1)
    do i_task=first_task,last_task
       call tasklist_GRAD(i_task)%f_ptr(self,var)
    end do
!$OMP END TASKLOOP

has an implicit !$OMP END TASKGROUP clause at the end of taskloop (i.e a wait on completion of child tasks of the current task and their descendent tasks.)

On the other hand your first code

do i_task=first_task,last_task
   !$OMP TASK FIRSTPRIVATE(i_task) SHARED(tasklist_GRAD,self,var)
   call tasklist_GRAD(i_task)%f_ptr(self,var)
   !$OMP END TASK
   !$OMP TASKWAIT  !< comment this to compare between the first and the second code
end do

is quite different, because when a task is created !$OMP TASKWAIT will wait for its completion, but does not wait for any descendant tasks. The next task is scheduled only when the child task is completed. It practically means that if there is no descendant tasks (i.e no other task is created in call tasklist_GRAD(i_task)%f_ptr(self,var) ) your program runs serially not concurrently. So, the !$OMP TASKWAIT should be placed after the end do:

do i_task=first_task,last_task
   !$OMP TASK FIRSTPRIVATE(i_task) SHARED(tasklist_GRAD,self,var)
   call tasklist_GRAD(i_task)%f_ptr(self,var)
   !$OMP END TASK
end do    
!$OMP TASKWAIT  !< comment this to compare between the first and the second code

In this case first all the tasks are created, then !$OMP TASKWAIT waits for their completition. If there are no descendant tasks, it does the same thing as your second code using taskloop.

Note that !$OMP TASK UNTIED and the corresponding !$OMP END TASK should be deleted in your first code, it is not necessary...

Answering your second question, if you delete !$OMP TASKWAIT in your first code, the task synchronization point will be !$OMP END SINGLE, so tasklist_GRAD(i_task)%state=STATE_INACTIVE may be encountered before tasks are finished. My guess is that it is not your intention.

Answering your third question: Tasking works properly regardless of the number of tasks created. The only question is efficiency, if there are too few tasks it can cause load imbalance, if there are too many tasks it can cause overhead, but it is system/implementation dependent.