Search code examples
c++memory-leakscudavalgrindthrust

How to use thrust and valgrind together to detect memory leaks?


Is there a way to use the CUDA thrust library with the Valgrind memory leak checker?

The reason I ask is because this simple program:

#include <thrust/device_vector.h>

int main(){
    thrust::device_vector<int> D(5);
    assert( D.size() == 5 );
}

compiled with:

$ /usr/local/cuda-11.1/bin/nvcc device_vector.cu -o device_vector.cu.x

Makes Valgrind believe that there are multiple possible memory leaks.

I know they must be false positives and that valgrind is not made to detect GPU memory leaks but I wonder if there is a flag or a standard way to make both tools work together (e.g. to detect CPU memory leaks).

If there is a standard set of Valgrind exceptions around I will gladly use them, but I wanted to ask before playing wack-a-mole.

$ valgrind ./device_vector.cu.x 
==765561== Memcheck, a memory error detector
==765561== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==765561== Using Valgrind-3.15.0 and LibVEX; rerun with -h for copyright info
==765561== Command: ./device_vector.cu.x
==765561== 
==765561== Warning: noted but unhandled ioctl 0x30000001 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x27 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x25 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x37 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x17 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: set address range perms: large range [0x200000000, 0x300200000) (noaccess)
==765561== Warning: set address range perms: large range [0x681f000, 0x2681e000) (noaccess)
==765561== Warning: noted but unhandled ioctl 0x19 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: set address range perms: large range [0x10006000000, 0x10106000000) (noaccess)
==765561== Warning: noted but unhandled ioctl 0x49 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x21 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x1b with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== Warning: noted but unhandled ioctl 0x44 with no size/direction hints.
==765561==    This could cause spurious value errors to appear.
==765561==    See README_MISSING_SYSCALL_OR_IOCTL for guidance on writing a proper wrapper.
==765561== 
==765561== HEAP SUMMARY:
==765561==     in use at exit: 6,678,624 bytes in 8,647 blocks
==765561==   total heap usage: 11,448 allocs, 2,801 frees, 40,718,174 bytes allocated
==765561== 
==765561== LEAK SUMMARY:
==765561==    definitely lost: 0 bytes in 0 blocks
==765561==    indirectly lost: 0 bytes in 0 blocks
==765561==      possibly lost: 22,216 bytes in 187 blocks
==765561==    still reachable: 6,656,408 bytes in 8,460 blocks
==765561==         suppressed: 0 bytes in 0 blocks
==765561== Rerun with --leak-check=full to see details of leaked memory
==765561== 
==765561== For lists of detected and suppressed errors, rerun with: -s
==765561== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

The mentioned readme README_MISSING_SYSCALL_OR_IOCTL was not very helpful to me.


NOTE ADDED: CUDA comes with a memchecker called cuda-memcheck which doesn't report memory leaks in the program above, however it doesn't seem to be a replacement for valgrind, since it doesn't detect the actual memory leak in a simple cpu program:

#include <thrust/device_vector.h>

int main(){
//  thrust::device_vector<int> D(5);
//  assert( D.size() == 5 );
    
//  cudaDeviceSynchronize();
    std::allocator<int> alloc;
    int* p = alloc.allocate(10);
    p[0] = 2;
    return p[0];
}

Solution

  • Currently I am using this suppression file .valgrind-supressions at the root of my project:

    {
       <suppression_for_thrust_allocations>
       Memcheck:Leak
       match-leak-kinds: possible
       fun:*alloc
       ...
       obj:*libcuda.so.*
       ...
       obj:*libcuda.so.*
       fun:__cudart*
       ...
       fun:__cudart*
       fun:cudaMalloc
       fun:_ZN6thrust6system4cuda6detail20cuda_memory_resourceIXadL_Z10cudaMallocEEXadL_Z8cudaFreeEENS_8cuda_cub7pointerIvEEE11do_allocateEmm
       ...
    }
    

    (the three dots are actual code)

    By removing the _ZN6thrust line maybe it could be more general, but I don't want to generalize suppressions prematurely.

    It is important to note that this is not checking for leaks in the GPU, for that cuda-memcheck is needed.


    UPDATE: I broadened the suppression to 1) include cases generated from cudaMallocManaged as well 2) Caused by CUDA runtime without the participation of thrust allocators (as @RobertCrovella mentioned).

    {
       <suppression_for_cudaMalloc_and_cudaMallocManaged_allocations>
       Memcheck:Leak
       match-leak-kinds: possible
       fun:*alloc
       ...
       obj:*libcuda.so.*
       ...
       obj:*libcuda.so.*
       fun:__cudart*
       ...
       fun:__cudart*
       fun:cudaMalloc*
       ...
    }
    

    In CMakeLists.txt I am using these options to actually use the suppression file listed above

      ...
    set(MEMORYCHECK_COMMAND_OPTIONS "-q --tool=memcheck --leak-check=yes --num-callers=52 --trace-children=yes --leak-check=full --track-origins=yes --gen-suppressions=all") # must go before `include(CTest)`
    set(MEMORYCHECK_SUPPRESSIONS_FILE "${PROJECT_SOURCE_DIR}/.valgrind-suppressions") # must go before `include(CTest)`
    
    include(CTest)
      ...
    

    (the three dots here stand for the rest of the file)

    Improvements to this suppression pattern are welcome.


    For reference a full automatically generated suppression by valgrind looks like this:

    {
       <insert_a_suppression_name_here>
       Memcheck:Leak
       match-leak-kinds: possible
       fun:calloc
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       obj:/usr/lib/x86_64-linux-gnu/libcuda.so.470.63.01
       fun:__cudart764
       fun:__cudart763
       fun:__cudart768
       fun:__cudart941
       fun:__cudart607
       fun:cudaMalloc
       fun:_ZN6thrust6system4cuda6detail20cuda_memory_resourceIXadL_Z10cudaMallocEEXadL_Z8cudaFreeEENS_8cuda_cub7pointerIvEEE11do_allocateEmm
       fun:_ZN6thrust26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL_Z10cudaMallocEEXadL_Z8cudaFreeEENS_8cuda_cub7pointerIvEEEEE11do_allocateEmm
       fun:_ZN6thrust2mr9allocatorIiNS_26device_ptr_memory_resourceINS_6system4cuda6detail20cuda_memory_resourceIXadL_Z10cudaMallocEEXadL_Z8cudaFreeEENS_8cuda_cub7pointerIvEEEEEEE8allocateEm
       fun:_ZZN6thrust6detail16allocator_traitsINS_16device_allocatorIiEEE8allocateERS3_mEN19workaround_warnings8allocateES5_m
       fun:_ZN6thrust6detail16allocator_traitsINS_16device_allocatorIiEEE8allocateERS3_m
       fun:_ZN6thrust6detail18contiguous_storageIiNS_16device_allocatorIiEEE8allocateEm
       fun:_ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE17allocate_and_copyINS0_15normal_iteratorIPKiEEEEvmT_SA_RNS0_18contiguous_storageIiS3_EE
       fun:_ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE10range_initINS0_15normal_iteratorIPKiEEEEvT_SA_NS_27random_access_traversal_tagE
       fun:_ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEE10range_initINS0_15normal_iteratorIPKiEEEEvT_SA_
       fun:_ZN6thrust6detail11vector_baseIiNS_16device_allocatorIiEEEC1IiSaIiEEERKNS1_IT_T0_EE
       fun:_ZN6thrust13device_vectorIiNS_16device_allocatorIiEEEC1IiSaIiEEERKNS_11host_vectorIT_T0_EE
       fun:_ZN6vector11test_methodEv
       fun:_ZL14vector_invokerv
       fun:_ZN5boost6detail8function22void_function_invoker0IPFvvEvE6invokeERNS1_15function_bufferE
       obj:/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.74.0
       fun:_ZN5boost17execution_monitor13catch_signalsERKNS_8functionIFivEEE
       fun:_ZN5boost17execution_monitor7executeERKNS_8functionIFivEEE
       fun:_ZN5boost17execution_monitor8vexecuteERKNS_8functionIFvvEEE
       fun:_ZN5boost9unit_test19unit_test_monitor_t21execute_and_translateERKNS_8functionIFvvEEEm
       obj:/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.74.0
       obj:/usr/lib/x86_64-linux-gnu/libboost_unit_test_framework.so.1.74.0
       fun:_ZN5boost9unit_test9framework3runEmb
       fun:_ZN5boost9unit_test14unit_test_mainEPFbvEiPPc
       fun:main
    }