Search code examples
cudasynchronizationcomparison

Compare value between 2 threads in CUDA


Consider all the threads in a block have an integer variable a with a value that may be different across threads. How to compare if the first thread in the block has the same value of a as the last thread in the block? I just need to compare these 2 threads because the value of a is non-decreasing along with the thread index.


Solution

  • Here is a solution with a block containing more than a warp, you have to use shared or global memory in which shared memory will be faster. static shared memory can be used as the size is known:

    // .. 
    thread_block block = this_thread_block();
    __shared__ int s[2];
    __shared__ bool result;
    uint32_t tid = threadIdx.x;
    
    if(tid == 0)
        s[0] = a;
    if(tid == blockDim.x - 1)
        s[1] = a;
    block.sync(); // synchronize the block instead the whole grid
    
    if(tid == 0)
        result = s[0] == s[1];
    block.sync();
    

    if the block had only a warp or if you had to compare the a from the first and last thread of a warp, there is a better solution with cooperative groups

    #include <cooperative_groups.h>
    namespace cg = cooperative_groups;
    
    auto active = cg::coalesced_threads();
    int theOtherA = active.shfl(a, active.size() -1); // everybody has a copy a from 
    //last thread of the warp
    bool result = a == theOtherA; // thread 0 has correct result
    result = active.shfl(result, 0); // get a copy from the register of thread 0