Search code examples
debuggingcudacuda-gdb

How can I debug code 700 "illegal memory access" aka `CUDA_EXCEPTION_14, Warp Illegal Address`?


My code is showing

CUDA error calling "cudaStreamSynchronize(0)", code is 700 an illegal memory access was encountered on 489
 0# my_func(signed char const*, unsigned char const*, int*, int*, int, int) in libthing.so

How can I debug this?


Solution

  • While using cuda-gdb to debug kernels is generally a good skill to develop (See Richard's answer), there is an easier, less interactive way of debugging this kind of memory access error. Nvidia has been shipping their memory checking tool with the CUDA Toolkit for a long time, first as cuda-memcheck (not available in CUDA 12 anymore) and nowadays as part of the compute-sanitizer tool (available since CUDA 11).

    Both tools are used by passing them the name of the program to check. The program will then be executed without further user input required. This makes the tool usable for e.g. automated CI jobs to make sure that there are no correctness regressions. Different checkers can be applied, but memcheck is the default checker and needs no further command line arguments to be chosen.

    $ compute-sanitizer /path/to/executable
    

    If you compile a debug build (-G for device code) or just compile with -lineinfo, these tools are not only able to tell you which thread in which kernel does an illegal access of which kind and size, but also pinpoint in which line of code that access is made.

    The ComputeSanitizer documentation also gives an example of how the output looks like:

    ========= Invalid __global__ write of size 4 bytes
    =========     at unaligned_kernel():0x160 in memcheck_demo.cu:6
    =========     by thread (0,0,0) in block (0,0,0)
    =========     Address 0x7f6510c00001 is misaligned
    

    So memcheck_demo.cu:6 points to line 6 of the demo source code.

    Further demonstrations can be found in the NVIDIA/compute-sanitizer-samples repository. Both tools can do a host of other checks. E.g. checking for shared memory race-conditions by passing --tool racecheck.