What's the 'right' way to implement a 32-bit memset for CUDA?

CUDA has the API call

cudaError_t cudaMemset (void *devPtr, int value, size_t count)

which fills a buffer with a single-byte value. I want to fill it with a multi-byte value. Suppose, for the sake of simplicity, that I want to fill devPtr with a 32-bit (4-byte) value, and suppose we can ignore endianness. Now, the CUDA driver has the following API call:

CUresult cuMemsetD32(CUdeviceptr dstDevice, unsigned int ui, size_t N)

So is it enough for me to just: obtain the CUdeviceptr from the device-memory-space pointer, then make the driver API call? Or is there something else I need to be doing?

Solution

As of about CUDA 3.0, runtime API device pointers (and everything else) are interoperable with the driver API. So yes, you can use cuMemsetD32 to fill a runtime API allocation with a 32 bit value. The size of CUdeviceptr will match the size of void *on you platform and it is safe to cast a pointer from the CUDA API to CUdeviceptr or vice versa.