I am fairly new to OpenCL and though I have understood everything up until now, but I am having trouble understanding how buffer objects work.
I haven't understood where a buffer object is stored. In this StackOverflow question it is stated that:
If you have one device only, probably (99.99%) is going to be in the device. (In rare cases it may be in the host if the device does not have enough memory for the time being)
To me, this means that buffer objects are stored in device memory. However, as is stated in this StackOverflow question, if the flag CL_MEM_ALLOC_HOST_PTR
is used in clCreateBuffer
, the memory used will most likely be pinned memory. My understanding is that, when memory is pinned it will not be swapped out. This means that pinned memory MUST be located in RAM, not in device memory.
So what is actually happening?
What I would like to know what do the flags:
CL_MEM_USE_HOST_PTR
CL_MEM_COPY_HOST_PTR
CL_MEM_ALLOC_HOST_PTR
imply about the location of buffer.
Thank you
The specification is (deliberately?) vague on the topic, leaving a lot of freedom to implementors. So unless an OpenCL implementation you are targeting makes explicit guarantees for the flags, you should treat them as advisory.
First off, CL_MEM_COPY_HOST_PTR
actually has nothing to do with allocation, it just means that you would like clCreateBuffer
to pre-fill the allocated memory with the contents of the memory at the host_ptr
you passed to the call. This is as if you called clCreateBuffer
with host_ptr = NULL
and without this flag, and then made a blocking clEnqueueWriteBuffer
call to write the entire buffer.
Regarding allocation modes:
CL_MEM_USE_HOST_PTR
- this means you've pre-allocated some memory, correctly aligned, and would like to use this as backing memory for the buffer. The implementation can still allocate device memory and copy back and forth between your buffer and the allocated memory, if the device does not support directly accessing host memory, or if the driver decides that a shadow copy to VRAM will be more efficient than directly accessing system memory. On implementations that can read directly from system memory though, this is one option for zero-copy buffers.CL_MEM_ALLOC_HOST_PTR
- This is a hint to tell the OpenCL implementation that you're planning to access the buffer from the host side by mapping it into host address space, but unlike CL_MEM_USE_HOST_PTR
, you are leaving the allocation itself to the OpenCL implementation. For implementations that support it, this is another option for zero copy buffers: create the buffer, map it to the host, get a host algorithm or I/O to write to the mapped memory, then unmap it and use it in a GPU kernel. Unlike CL_MEM_USE_HOST_PTR
, this leaves the door open for using VRAM that can be mapped directly to the CPU's address space (e.g. PCIe BARs).Note that the implementation may also use any access flags provided ( CL_MEM_HOST_WRITE_ONLY
, CL_MEM_HOST_READ_ONLY
, CL_MEM_HOST_NO_ACCESS
, CL_MEM_WRITE_ONLY
, CL_MEM_READ_ONLY
, and CL_MEM_READ_WRITE
) to influence the decision where to allocate memory.
Finally, regarding "pinned" memory: many modern systems have an IOMMU, and when this is active, system memory access from devices can cause IOMMU page faults, so the host memory technically doesn't even need to be resident. In any case, the OpenCL implementation is typically deeply integrated with a kernel-level device driver, which can typically pin system memory ranges (exclude them from paging) on demand. So if using CL_MEM_USE_HOST_PTR
you just need to make sure you provide appropriately aligned memory, and the implementation will take care of pinning for you.