I'm writing an OpenGL/CUDA (6.5) interop application. I get a compile time error trying to write a floating point value to an OpenGL texture through a surface reference in my CUDA kernel.
Here I give a high level description of how I set up the interop, but I am successfully reading from my texture in my CUDA kernel, so I believe this is done correctly. I have an OpenGL texture declared with
glTexImage2D(GL_TEXTURE_RECTANGLE_ARB, 0, GL_RGB32F_ARB, 512, 512, 0, GL_RGB, GL_FLOAT, NULL);
After creating the texture I call cudaGraphicsGLRegisterImage
with cudaGraphicsRegisterFlagsSurfaceLoadStore
set. Before running my CUDA kernel, I unbind the texture and call cudaGraphicsMapResources
on the cudaGraphicsResource
pointers obtained from cudaGraphicsGLRegisterImage
. Then I get a cudaArray
from cudaGraphicsSubResourceGetMappedArray
, create an appropriate resource descriptor for the array, and call cudaCreateSurfaceObject
to get a pointer to a cudaSurfaceObject_t
. I then call cudaMemcpy
with cudaMemcpyHostToDevice
to copy the cudaSurfaceObject_t
to a buffer on the device allocated by cudaMalloc
.
In my CUDA kernel I can read from the surface reference with something like this, and I have verified that this works as expected.
__global__ void cudaKernel(cudaSurfaceObject_t tex) {
int x = blockIdx.x*blockDim.x + threadIdx.x;
int y = blockIdx.y*blockDim.y + threadIdx.y;
float4 sample = surf2Dread<float4>(tex, (int)sizeof(float4)*x, y, cudaBoundaryModeClamp);
In the kernel I want to modify sample and write it back to the texture. The GPU has compute capability 5.0, so this should be possible. I am trying this
surf2Dwrite<float4>(sample, tex, (int)sizeof(float4)*x, y, cudaBoundaryModeClamp);
But I get the error:
error: no instance of overloaded function "surf2Dwrite" matches the argument list
argument types are: (float4, cudaSurfaceObject_t, int, int, cudaSurfaceBoundaryMode)
I can see in
cuda-6.5/include/surface_functions.h
that there are only prototypes for integral versions of surf2Dwrite
that accept a void *
for the second argument. I do see prototypes for surf2Dwrite
which accept a float4
with a templated surface
object, However, I'm not sure how I could declare a templated surface
object with OpenGL interop. I haven't been able to find anything else on how to do this. Any help is appreciated. Thanks.
It turns out the answer was pretty simple, though I don't know why it works. Instead of calling
surf2Dwrite<float4>(sample, tex, (int)sizeof(float4)*x, y, cudaBoundaryModeClamp);
I needed to call
surf2Dwrite(sample, tex, (int)sizeof(float4)*x, y, cudaBoundaryModeClamp);
To be honest I'm not sure I fully understand CUDA's use of templating in c++. Anyone have an explanation?