Search code examples
memoryif-statementcudastorecoalescing

Cuda - selective memory store


In my kernel, if a condition is met, I update an item of the output buffer

if (condition(input[i])) //?
    output[i] = 1;

otherwise the output may stay the same, having value of 0.

The density of updates are quite unpredictable, depending on the input. Furthermore which output location will be updated is also not known. (i may force them though, in some cases)

My question is, is it better to write all items, to achieve coalescing, or do a selective write?

output[i] = condition(input[i]); //? 

Would you mind discussing your statements?


Solution

  • Coalescing is achieved even if some threads in the warp do not participate in the load or store, as long as all participating threads satisfy the requirements of coalescing. So conditional writes should have no effect on memory throughput.

    However, doing a conditional write may involve additional instructions due to involving a branch (this would probably explain, for example, the difference in performance measured by Eugene in his answer).