Search code examples
openmpoffloading

OpenMP Target Task reduction


I'm using OpenMP target offloading do offload some nested loops to the gpu. I'm using the nowait to tun it asynchronous. This makes it a task. With the same input values the result differs from the one when not offloading (e.g. cpu: sum=0.99, offloading sum=0.5). When removing the nowait clause it works just fine. So I think the issue is that it becomes an OpenMP task and I'm struggling getting it right.

#pragma omp target teams distribute parallel for reduction( +: sum) collapse(2) nowait depend(in: a, b) depend(out: sum)
    for (int i = 1; i <= n; i++)
    {
        for (int j = 1; j <= n; j++)
        {   
            double c = 0;
            
            for (int k = 0; k < n; k++)
            {
                c += /* some computation */
            }
            
            sum += fabs(c); 
        }
    }

Solution

  • The OpenMP 5.2 specification states:

    The target construct generates a target task. The generated task region encloses the target region. If a depend clause is present, it is associated with the target task. [...]. If the nowait clause is present, execution of the target task may be deferred. If the nowait clause is not present, the target task is an included task.

    This means that your code is executed in a task with a possibly deferred execution (with nowait). Thus, it can be executed at the end of the parallel in the worst case, but always before all the dependent tasks and taskwait directives waiting for the target task (or the ones including a similar behaviour like taskgroup). Because of that, you need not to modify the working arrays (nor release them) during this time span. If you do, the behaviour is undefined.

    You should especially pay attention to the correctness of synchronization points and task dependencies in your code (it is impossible for us to check that with the current incomplete provided code).