Search code examples
multithreadingcudacuda-gdb

New Thread spawned by cudaMalloc | Behaviour?


cudaMalloc seemed to have spawned a thread when it was called, even though it's asynchronous. This was observed during debugging using cuda-gdb. cudaMalloc called within alloc_mem_GPU spawning New Thread

It also took a while to return.

The same thread exited, although as a different LWP, at the end of the program.

Can someone explain this behaviour ?


Solution

  • The thread is not specifically spawned by cudaMalloc. The user side CUDA driver API library seems to spawn threads at some stage during lazy context setup which have the lifetime of the CUDA context. The exact processes are not publicly documented.

    You see this associated with cudaMallocbecause I would guess this is the first API to trigger whatever setup/callbacks need to be done to make the userspace driver support work. You should notice that only the first call spawns a thread. Subsequent calls do not. And the threads stay alive for the lifetime of the CUDA context, after which they are terminated. You can trigger explicit thread destruction by calling cudaDeviceReset at any point in program execution.

    Here is a trivial example which demonstrates cudaMemcpyToSymbol triggering the thread spawning from the driver API library, rather than cudaMalloc:

    __device__ float someconstant;
    
    int main()
    {
        cudaSetDevice(0);
        const float x = 3.14159f;
        cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
        for(int i=0; i<10; i++) {
            int *x;
            cudaMalloc((void **)&x, size_t(1024));
            cudaMemset(x, 0, 1024);
            cudaFree(x);
        }
        return int(cudaDeviceReset());
    }
    

    In gdb I see this:

    (gdb) tbreak main
    Temporary breakpoint 1 at 0x40254f: file gdb_threads.cu, line 5.
    (gdb) run
    Starting program: /home/talonmies/SO/a.out 
    [Thread debugging using libthread_db enabled]
    Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
    
    Temporary breakpoint 1, main () at gdb_threads.cu:5
    5       cudaSetDevice(0);
    (gdb) next
    6       const float x = 3.14159f;
    (gdb) next
    7       cudaMemcpyToSymbol(someconstant, &x, sizeof(float));
    (gdb) next
    [New Thread 0x7ffff5eb5700 (LWP 14282)]
    [New Thread 0x7fffed3ff700 (LWP 14283)]
    8       for(int i=0; i<10; i++) {
    (gdb) info threads
      Id   Target Id         Frame 
      3    Thread 0x7fffed3ff700 (LWP 14283) "a.out" pthread_cond_timedwait@@GLIBC_2.3.2 ()
        at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
      2    Thread 0x7ffff5eb5700 (LWP 14282) "a.out" 0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
    * 1    Thread 0x7ffff7fd1740 (LWP 14259) "a.out" main () at gdb_threads.cu:8
    
    (gdb) thread apply all bt
    
    Thread 3 (Thread 0x7fffed3ff700 (LWP 14283)):
    #0  pthread_cond_timedwait@@GLIBC_2.3.2 () at ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238
    #1  0x00007ffff65cad97 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
    #2  0x00007ffff659582d in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
    #3  0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
    #4  0x00007ffff79bc182 in start_thread (arg=0x7fffed3ff700) at pthread_create.c:312
    #5  0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 2 (Thread 0x7ffff5eb5700 (LWP 14282)):
    #0  0x00007ffff74d812d in poll () at ../sysdeps/unix/syscall-template.S:81
    #1  0x00007ffff65c9953 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
    #2  0x00007ffff66571ae in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
    #3  0x00007ffff65ca4d8 in ?? () from /usr/lib/x86_64-linux-gnu/libcuda.so.1
    #4  0x00007ffff79bc182 in start_thread (arg=0x7ffff5eb5700) at pthread_create.c:312
    #5  0x00007ffff74e547d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
    
    Thread 1 (Thread 0x7ffff7fd1740 (LWP 14259)):
    #0  main () at gdb_threads.cu:8