Search code examples
visual-studiovisual-studio-2013cudansight

Neglected breakpoints when using nsight's "Start CUDA debugging"


Breakpoints in .cu files in Visual Studio 2013 work fine when using the "Local Windows Debugger". But when using nsight's "Start CUDA debugging" the breakpoints are neglected. How is this possible? At nsight's site they state: "Use the familiar Visual Studio Locals, Watches, Memory and Breakpoints windows". So I guess the normal breakpoints can be used?

Edit:

  • Enable CUDA Memory Checker: On/Off makes no difference
  • Generate GPU Debug Information: No/Yes (-G0) makes no difference
  • Start CUDA/Graphics debugging: breakpoints neglected

Solution

    • "Start CUDA debugging" debugs device (kernel) code, i.e. stuff compiled with nvcc -> bunch of preprocessing -> cudafe++ -> cicc toolchain path.
    • "Local Windows Debugger" debugs host code, a stuff compiled with either nvcc -> bunch of preprocessing -> cl or just cl.

    It does not matter in which file,.cpp, .cu or .h your code is. The only thing that matters is if your code is annotated as __device__ or __global__ or not.

    As of CUDA 7.5 RC (Aug 2015), on Windows you can only debug one of those at a time. On Linux and OSX you can debug both at the same time with cuda-gdb.

    See also: NVIDIA CUDA Compiler Driver NVCC

    Other things that could lead to frustration during debugging on Windows:

    • You are setting up properties for one configuration/platform pair, but running another one
    • Something went wrong with .pdb files for host and device modules. Check nvcc, cl, nvlink and link options. For example host and device debug info could be written in the same file, overwriting each other.
    • Aggressive optimizations: inlining, optimizing out locals, etc. Release code is almost impossible to debug for a human. Debugger can be fooled as well.
    • Presence of undefined behavior and/or of memory access violations. They can easily crash debugger leading to unexpected results, such as breakpoints not being hit.
    • You forgot to check errors for one of the CUDA API or kernel calls, there was error, and CUDA context is dead and kernels will not run anymore. But you don't know this yet. Your host code continues to run, and you expect kernel breakpoints to hit, but it will never happen, because kernel will just not be called.
    • All bugs described above could be in a library. Don't expect libraries to be bug-free.
    • Compilers, debuggers and drivers have bugs too. But you should always assume it's something wrong with your code first, and if nothing helps, investigate and file a bug report to a vendor.