I'm doing the following in an OpenCL kernel (simplified example):
__kernel void step(const uint count, __global int *map, __global float *sum)
{
const uint i = get_global_id(0);
if(i < count) {
sum[map[i]] += 12.34;
}
}
Here, sum
is some quantity I want to calculate (previously set to zero in another kernel) and map
is a mapping from integers i
to integers j
, such that multiple i
's can map to the same j
.
(map
could be in constant memory rather than global, but it seems the amount of constant memory on my GPU is incredibly limited)
Will this work? Is a "+=" implemented in an atomic way, or is there a chance of concurrent operations overwriting each other?
Will this work? Is a "+=" implemented in an atomic way, or is there a chance of concurrent operations overwriting each other?
It will not work. When threads access memory written to by other threads, you need to explicitly resort to atomic operations. In this case, atomic_add
.
Something like:
__kernel void step(const uint count, __global int *map, __global double *sum)
{
const uint i = get_global_id(0);
if(i < count) {
atomic_add(&sum[map[i]], 1234);
}
}