Consider all the threads in a block have an integer variable a
with a value that may be different across threads. How to compare if the first thread in the block has the same value of a
as the last thread in the block? I just need to compare these 2 threads because the value of a
is non-decreasing along with the thread index.
Here is a solution with a block containing more than a warp, you have to use shared or global memory in which shared memory will be faster. static shared memory can be used as the size is known:
// ..
thread_block block = this_thread_block();
__shared__ int s[2];
__shared__ bool result;
uint32_t tid = threadIdx.x;
if(tid == 0)
s[0] = a;
if(tid == blockDim.x - 1)
s[1] = a;
block.sync(); // synchronize the block instead the whole grid
if(tid == 0)
result = s[0] == s[1];
block.sync();
if the block had only a warp or if you had to compare the a
from the first and last thread of a warp, there is a better solution with cooperative groups
#include <cooperative_groups.h>
namespace cg = cooperative_groups;
auto active = cg::coalesced_threads();
int theOtherA = active.shfl(a, active.size() -1); // everybody has a copy a from
//last thread of the warp
bool result = a == theOtherA; // thread 0 has correct result
result = active.shfl(result, 0); // get a copy from the register of thread 0