CUDA memory bank conflict?

I have written a simple code that only launches one block and one thread inside the block. The kernel creates one 48 KB __shared__ memory array, filling the entire shared memory of the streaming multiprocessor. The code sets and unsets individual bits in the shared memory. I have noticed that with the first 32 bits, the code works fine. However, as I start flipping remaining bits, nothing happens and the bits stay unchanged.

Any ideas what is going on? I am new to CUDA programming. Is there any reason to believe that this has something to do with memory bank conflicts?

Solution

To answer your question, NO, this has nothing to do with shared memory bank conflicts. Bank conflicts only affect performance, not correctness. Thus you would get the same result with or without bank conflicts.

You should add error checking to your host code and check the result returned from all cuda API functions. I suspect you are getting an error somewhere. In general you should post code with your questions if you want a more accurate answer.