__kernel void cl_test(__global int* Number)
{
int id = get_global_id(0);
if (id%5==0)
{
Number[0]++;
}
if (id%10==0)
{
Number[1]++;
}
}
As you can see, this is a very simple OpenCL kernel test code, what I want is to collect the number divisible by 5 and 10 in a range.
So here is the problem: since every work item's calculation is not pure parallel, the Number[0] or [1] in different items are related. I can't get the correct result by reading the Number[0] or Number[1].
Is there any solution like the "global variable" in C++?
Thanks!
You need to use atomic operations.
__kernel void cl_test(__global int* Number)
{
int id = get_global_id(0);
if (id%5==0)
{
atomic_inc(Number);
}
if (id%10==0)
{
atomic_inc(&Number[1]);
}
}
You should avoid using those as much as possible as atomic operations tend to be rather slow precisely because they make sure that it works correctly across threads.