Search code examples
parallel-processingcudanvcc

cudaDeviceSynchronize() not found in nvcuda.dll


I'm writting CUDA code, compiling it with nvcc in VS2022, generating a PTX file, and running the CUDA code from Embarcadero Delphi. For running the CUDA kernels from Delphi I have written an API to nvcuda.dll, which has been working very well. For example, I use functions like cuInit, cuMemAlloc, cuLaunchKernel, cuMemcpyDtoH_v2, cuMemcpyHtoD_v2 without any problem, all according to the CUDA driver API.

However, I have not been able to find cudaDeviceSynchronize() in nvcuda.dll (or libcuda.so). Although cudaDeviceSynchronize() is present in most CUDA demo programs to be compiled by nvcc, it does not seem to exist in the DLL.

How can make the CPU wait for a CUDA kernel using the driver API (i.e. through the DLL, not a C program compiled by nvcc)?


Solution

  • … use functions like cuInit, cuMemAlloc, cuLaunchKernel, cuMemcpyDtoH_v2, cuMemcpyHtoD_v2 without any problem, all according to the CUDA Runtime API

    Those are not runtime API functions, they are driver API functions. And the reason why you find them in NVCUDA.DLL is because that library is the driver API provider on Windows.

    The reason you can’t find CudaDeviceSynchronize is because it is a runtime API function. If you are actually using the driver API then the equivalent function would be cuCtxSynchronize.