Suppose I define a shared variable in a cuda kernel as follows:
__shared__ int var;
Now, let's say at some point in my kernel I'd like to assign some value, say 100
to var
. Saying
var = 100;
results in all threads in the block executing this assignment.
How can I have the assignment take place only once? Is the following my only option?
if( threadIdx.x == 0)
var = 100;
Your only option is actually this:
if( threadIdx.x == 0)
var = 100;
__syncthreads();
If you omit a synchronisation barrier, there is no guarantee that all threads in the block will read the value of var
after the assignment statement is executed.