Search code examples
performanceopenclmemset

What is the fastest way to memset() a GPU buffer with OpenCL?


I'm using OpenCL, and I need to memset() some array in global device memory. CUDA has a memset()-like API function, but OpenCL does not. I read this, where I found two possible alternatives:

  1. using memset() on the host with some scratch buffer, then clEnqueueWriteBuffer() to copy that to the buffer on the device.
  2. Enqueueing the following kernel:

    __kernel void memset_uint4(
        __global  uint4* mem,
        __private uint4  val) 
    {
        mem[get_global_id(0)] = val; 
    }
    

Which is better? Or rather, under which circumstances/for which platforms is one better than the other?

Note: If the special case of zero'ing memory merits special treatment, that would be nice to know too.


Solution

  • You can use clEnqueueFillBuffer() from OpenCL v1.2. That is exactly what you need. And it is very flexible on how to fill the buffer with patterns.

    If you are on 1.1 or below.... then you have to resort to other approaches.