CUDA has the API call
cudaError_t cudaMemset (void *devPtr, int value, size_t count)
which fills a buffer with a single-byte value. I want to fill it with a multi-byte value. Suppose, for the sake of simplicity, that I want to fill devPtr
with a 32-bit (4-byte) value, and suppose we can ignore endianness. Now, the CUDA driver has the following API call:
CUresult cuMemsetD32(CUdeviceptr dstDevice, unsigned int ui, size_t N)
So is it enough for me to just: obtain the CUdeviceptr
from the device-memory-space pointer, then make the driver API call? Or is there something else I need to be doing?
As of about CUDA 3.0, runtime API device pointers (and everything else) are interoperable with the driver API. So yes, you can use cuMemsetD32
to fill a runtime API allocation with a 32 bit value. The size of CUdeviceptr
will match the size of void *
on you platform and it is safe to cast a pointer from the CUDA API to CUdeviceptr
or vice versa.