Search code examples
cudacublas

Asynchrony and memory ownership in CUBLAS


CUBLAS is an asynchronous library. What are the requirements on memory ownership for parameters passed to CUBLAS?

It seems clear that matrices being operated on by CUBLAS should not be freed until the asynchronous calls complete - but what about the scalar parameters?

For example, is the following code sound:

//...
float alpha = compute_alpha();
cublasSaxpy(handle, n, 

            //Taking the address of an automatic variable!
            &alpha, //and handing it to an asynchronous function!

            x, incx,
            y, incy);
return;

I'm worried that alpha might not exist by the time Saxpy actually gets launched: if we return from the function before Saxpy launches, and the stack space for alpha gets overwritten with other stuff, it's possible Saxpy could get the wrong answer (or even crash).

I don't want to have to copy my scalar parameters to some sort of heap memory and ensure they don't get destructed until after an asynchronous call to CUBLAS - tracking this would be complicated.

It'd be great if CUBLAS explicitly guaranteed that scalar parameters do not need to live after a call to CUBLAS, but the documentation doesn't seem super clear about this.


Solution

  • If pointer mode is HOST, alpha and beta can be on the stack or allocated on the heap. Underneath the kernel(s) will be launched with the value of alpha and beta. So if they were allocated on the heap, they can be freed just after the return of the call (even though the kernel launch is asynchronous)

    If the pointer is DEVICE, alpha and beta MUST be accessible on the device and their values should not be modified until the kernel is done. Note that since cudaFree does an implicit cudaDeviceSynchronize(), cudaFree of alpha/beta can still be called just after the call but it defeats the purpose of the DEVICE pointer mode in this case.