how does #pragma omp taskwait with nested tasks work?

I'm currently learning about threads and how you can parallise code with them. Our currrent topic is #pragma omp task. My current understanding is the following:

Parallel code is executed
stumbles uppon #pragma omp task
throws code inside the scope in task pool
executes code after the scope
different thread gets taskt assigned
executed code inside the tasks scope

Imagne you have the following code

#pragma omp parallel {
  #pragma omp task // Task A 
  {...
    #pragma omp task // Task B 
    {...
    #pragma omp task // Task C 
      {...
      }
    #pragma omp taskwait
    }
  #pragma omp taskwait 
  }
}

What is the thread working on i.e. Task B when it has to #pragma omp tastwait? Is the thread forced to wait for Task C or can it execute an other taks while waiting for Task C?

I think it has to wait. Is my understanding of task even correct?

Solution

First of all, #pragma omp parallel is a fork-join directive. This means each thread will execute the content of the section and so the task A, B and C are created N times where N is the number of threads. You need a #pragma omp single or #pragma omp master directive so to create only 1 task of each type.

The scheduling of the tasks is mostly unspecified by the OpenMP standard. Basically, the standard specify when tasks are created and when they can be scheduled (scheduling point) how the scheduling is constrained by properties (like dependencies), but not when/how tasks actually will. This is the job of the runtime. In fact, mainstream runtimes use different scheduling strategies. For example, the IOMP runtime (Clang/ICC) uses a work-stealing approach while GOMP (GCC) tends to use a centralized scheduler (1 big queue).

The #pragma omp taskwait only applies in the current context of the parent encompassing task. This means the first taskwait only wait for the task C, and the second wait for the task B. When a thread wait for a task it can execute other tasks because a taskwait directive is a scheduling point. IOMP does that for example. This does not mean other tasks are ensured to be executed.

You should not expect the other tasks to be completed. This assumption is often done by programmers when tasks contains blocking primitives. This is a bad idea since it typically uses a thread for no reason during an unpredictable time (the OpenMP runtime cannot optimize its scheduling based on that since it does not have the information), and also because it can cause deadlocks (due to task dependencies and regarding the implementation of the runtime). Note also that a taskyield does not enfore the task progression either (doing nothing is a perfectly valid implementation of a taskyield).

Finally, it works as the following. Each thread creates a task A that can be executed by other thread (unlikely here). When a task A is executed, it creates another task that can be executed by other threads and so on for C. When the first taskwait is executed in a thread, the associated parent tasks A and B are started, C is created and may have been executed already. The same thread can start the tasks A, B and C since a task can be interrupted (possibly using a continuation). When the taskwait is executed, the task C created in the current B task is guaranteed to be finished (but the state of the other tasks is undefined).