This is a seemingly basic problem I haven't been able to right with a fair amount of trial and error. I have a kernel which makes use of two global r/w buffers and one local - it takes input from the first buffer, does a pseudo-sort on it using the second buffer for interim storage, and ultimately copies it back to the first in a certain order. (Stripped) code is as follows:
struct PACKET_POINTER {
int packetIndex;
int currentCell;
};
#define RPC_DIV_BUCKET 100
__kernel void PseudoSort(__global struct PACKET_POINTER * in,__global struct PACKET_POINTER * out, __local struct PACKET_POINTER * aux) {
int i = get_local_id(0);
int wg = get_local_size(0);
int gid = get_global_id(0);
int offset = get_group_id(0) * wg;
aux[i] = in[i+offset];
barrier(CLK_LOCAL_MEM_FENCE);
//-----
//Irrelevant code block here
//-----
out[(gid%1024)*RPC_DIV_BUCKET + (gid/1024)] = aux[i];
}
Retrieving the contents of the "out" buffer in the parent C program happens without issue. However, when I add the following lines to the kernel:
barrier(CLK_GLOBAL_MEM_FENCE);
in[gid] = out[gid];
and attempt to read the "in" buffer, it turns up mostly garbage values on first execution, but will have the expected data if the .exe is run a second time without modification. I have a clFinish(commands) call between the kernel call and buffer read, so it should be running to completion before any read attempts. Something obvious I'm missing here? Appreciate the help in advance - will post a solution if I happen upon it before then.
CLK_GLOBAL_MEM_FENCE only syncs within a workgroup. There is no way to place a barrier that would sync across all workgroups (e.g it only syncs across those threads which have identical group_id).
You have a race condition there. As an example when global_id is 1 a write goes into out[100]. Then that particular thread reads from out[1] and writes to in[1]. However out[1] is written only at global_id 1024. Which is almost certainly in a different workgroup. So you will read garbage as the first workgroup is going to finish before the out[1] is ever going to get written.