Search code examples
cparallel-processingopenmpgpgpu

Difference between target and target data? How to do teams/threads configurations without teams directive?


I have 2 questions about new OpenMP 4.0.

First one is that I couldn't understand that what is the difference between target and target data? According to specifications target data create a new data environment. So what is the data environment? By the way can we liken OpenMP target data to OpenACC data directives?

The second question is as follows:

extern void init(float*, float*, int);
extern void output(float*, int);
void vec_mult(int N)
{
   int i;
   float p[N], v1[N], v2[N];
   init(v1, v2, N);
   #pragma omp target map(to: v1, v2) map(from: p)
   #pragma omp parallel for
   for (i=0; i<N; i++)
      p[i] = v1[i] * v2[i];
   output(p, N);
}

According to this example, there is no teams directive. So How should OpenMP compiler configurate device kernel? For if we talk about CUDA, do the invocation may like "kernel_func<<<1,1>>>"


Solution

  • I found answer for my 2nd question.

    If we want to use parallel for which inside of the target without teams directive, the compiler should generate kernel has 1 block. In the other hand compiler has to spawn the iterations through the threads inside of the block. For this reason kernels should have many threads (of course it's possible to work with 1 thread). The solution to implement this directives,

    1. you can create kernel with static number of threads in this case.
    2. it doesn't make sense but it can be done to configure kernel with 1 threads. kernel_invocation<<<1,1>>>(parameter1,parameter2, ...)
    3. best solution :) you need analyzer to decide threads number,dimension and so on.

    you can find more solution in this paper: http://rosecompiler.org/ROSE_ResearchPapers/Liao-OpenMP-Accelerator-Model-2013.pdf