Search code examples
cudagpgpuintrinsics

Insight into the first argument mask in __shfl__sync()


Here is the test code for broadcasting variable:

#include <stdio.h>
#include <cuda_runtime.h>

__global__ void broadcast(){
    int lane_id = threadIdx.x & 0x1f;
    int value = 31 - lane_id;
    //let all lanes within the warp be broadcasted the value 
    //whose laneID is 2 less than that of current lane
    int broadcasted_value = __shfl_up_sync(0xffffffff, value, 2)
    value = n;
    printf("thread %d final value = %d\n", threadIdx.x, value);
}

int main() {
    broadcast<<<1,32>>>();
    cudaDeviceSynchronize();
    return 0;
}

In effect, this question is the same as that of this page. Results of shuffling didn't vary at all, whatever I modified the mask(e.g. 0x00000000, 0x00000001, etc.). So, how to properly understand the effects of mask?


Solution

  • The mask parameter forces warp reconvergence, for warp lanes identified with a 1 bit, prior to performing the requested shuffle operation (assuming such reconvergence is possible, i.e. not prevented by conditional coding. If prevented by conditional coding, your code is illegal, and exploring undefined behavior - UB).

    For warp lanes that are already converged and active, it has no effect. It does not prevent lanes from participating in the shuffle operation if the mask parameter is set to zero. It also does not force inactive lanes to participate (inactive lanes would be lanes that are excluded by conditional coding).

    Since your code has no conditional behavior, there is no reason to believe there would be any lack of convergence, and therefore no change in behavior regardless of your mask parameter.

    That does not mean it is correct to specify a mask of 0. Your code is illegal if you expect lanes to participate but have not set their corresponding bit to 1 in the mask, and you would potentially be exploring UB in the event of warp divergence.

    For other descriptions of the mask, there are a number of answers here already.

    1. 2. 3. 4. 5.

    There's a chance any follow-up questions you may have are already answered in one of those.