Search code examples
c++cudaprofilernvcc

coalesced reads/writes in CUDA


Is there a way to check my kernel reads and writes in a coalesced way from/to global memory? I've been trying ways to make sure my kernel reads and writes to memory efficiently to get a better performance.

Thanks


Solution

  • Use a profiler such as nvprof

    The gld_efficiency and gst_efficiency metrics will give you a direct measure of percentage of coalesced global loads and stores. For example on Linux:

    nvprof --metrics gld_efficiency,gst_efficiency ./my_app