I'm trying to profile my code using Nivida Profiler, but I'm getting strange gaps in the timeline as shown below:
Note: both kernels on the edges of the gaps are CudaMemCpyAsync (Host-to-Device)
I'm running on Ubuntu 14.04 with latest version of CUDA, 8.0.61 and latest Nvidia display driver.
Intel integrated graphics card is used in display not Nvidia. So, Nvidia Graphics card is only running the code, not anything else.
I've enabled CPU Profiling as well to check these gaps but nothing is shown!
Also, no Debugging options are enabled (-G nor -g) and this is a "release build"
My laptop's specs:
Is there anyway to trace what's happening in these empty time slots?
Thanks,
I'm afraid there are no automatic methods, but you can add custom traces in your code to find what's happening :
To do that you can use NVTX.
follow the links for some tutorials or documentation.
These profiling holes are probably due to data loading, memory allocations/initialisations done by the host between your kernels executions.