Search code examples
c++kernelopenclgpuaccumulator

How can I use OpenCL kernel to make accumulator?


    __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           Number[0]++;
       }
       if (id%10==0)
       {
           Number[1]++;
       }
    }

As you can see, this is a very simple OpenCL kernel test code, what I want is to collect the number divisible by 5 and 10 in a range.

So here is the problem: since every work item's calculation is not pure parallel, the Number[0] or [1] in different items are related. I can't get the correct result by reading the Number[0] or Number[1].

Is there any solution like the "global variable" in C++?

Thanks!


Solution

  • You need to use atomic operations.

    __kernel void cl_test(__global int* Number)
    {
       int id = get_global_id(0);
       if (id%5==0)
       {
           atomic_inc(Number);
       }
       if (id%10==0)
       {
           atomic_inc(&Number[1]);
       }
    }
    

    You should avoid using those as much as possible as atomic operations tend to be rather slow precisely because they make sure that it works correctly across threads.