I am unable to understand the following statement in the NSIGHT user Guide
Non-Overlapping Input/Output Buffers
a kernel can malloc and free a buffer in the same launch,
but it cannot call an unmatched malloc or an unmatched free.
Can someone explain it a little more?
This simply means that you shouldn't malloc
or free
across different kernel launches.
If you malloc
during a kernel launch, you must free
it during the same launch, not several launches later.
This is only required if you enable the NSIGHT profiler option Non-Overlapping Input/Output Buffers
, as it allows the profiler to perform some optimizations. If you do malloc
or free
across kernel launches (which is perfectly fine as far as CUDA is concerned), then simply disable that option.