Search code examples
cudansight

CUDA NSIGHT: call an unmatched malloc or an unmatched free


I am unable to understand the following statement in the NSIGHT user Guide

Non-Overlapping Input/Output Buffers

a kernel can malloc and free a buffer in the same launch, 
but it cannot call an unmatched malloc or an unmatched free. 

Can someone explain it a little more?


Solution

  • This simply means that you shouldn't malloc or free across different kernel launches. If you malloc during a kernel launch, you must free it during the same launch, not several launches later.

    This is only required if you enable the NSIGHT profiler option Non-Overlapping Input/Output Buffers, as it allows the profiler to perform some optimizations. If you do malloc or free across kernel launches (which is perfectly fine as far as CUDA is concerned), then simply disable that option.