Search code examples
cudagpugpu-shared-memorybank-conflict

Bank conflicts in 2.x devices


What is a bank conflict in devices with 2.x devices? As I understand the CUDA C programming guide, in 2.x devices, if two threads access the same 32 bit word in the same shared memory bank, it does not cause a bank conflict. Instead, the word is broadcasted. When the two threads write the same 32 bit word in the same shared memory bank, then only one thread succeeds.

Since on-chip memory is 64 KB (48 KB for shared memory and 16 KB for L1, or vice versa), and it is organized in 32 banks, I am assuming that each bank consists of 2 KB. So I think that bank conflicts will arise if two threads access two different 32 bit words in the same shared memory bank. Is this correct?


Solution

  • Your description is correct. There are many access patterns that can generate bank conflicts, but here's a simple and common example: strided access.

    __shared__ int smem[512];
    
    int tid = threadIdx.x;
    
    x = smem[tid * 2]; // 2-way bank conflicts
    y = smem[tid * 4]; // 4-way bank conflicts
    z = smem[tid * 8]; // 8-way bank conflicts
    // etc.
    

    Bank ID = index % 32, so if you look at the pattern of addresses in the x, y, and z accesses, you can see that in each warp of 32 threads, for x, 2 threads will access each bank, for y, 4 threads will access each bank, and for z, 8 threads will access each bank.