Wrong OpenGL Texture Format Type for CL/GL-Interop?

I'm trying the OpenCL-OpenGL interop for textures on my Geforce 330M with CUDA Toolkit 4.0.

I want to capture a frame, use that data as an input image (Image2D) to a OpenCL Kernel. The Kernel should manipulate the data and write it to an Image2DGL, which is an image object with an attached OpenGL texture. Basically it looks like that:

 _______________      RGB        _______________
|               |    uint8*     |               |   CL_RGBA / CL_UNORM_INT8
|   Grabber     | ------------> |   Image2D     | -------------------------.
|   avcodec     |               |   [input]     |                          |
|_______________|               |_______________|                          |
                                                                           |    
                                                                           V
 _______________                 _______________                       _______________
|               |               |               |                     |               |
|   Texture     | ------------> |   Image2DGL   | <-----------------> |    Kernel     |
|_______________|               |   [output]    |                     |_______________|
                                |_______________|
Internal
Format: GL_RGBA
Format: GL_RGBA
Type: ?

I'm initializing the texture like that:

GLuint tex = 0;

void initTexture( int width, int height )
{
    glGenTextures(1, &tex);
    glBindTexture(GL_TEXTURE_RECTANGLE, tex);
// now here is where I need assistance: The type parameter of the Texture (GL_FLOAT)
    glTexImage2D(GL_TEXTURE_RECTANGLE, 0, GL_RGBA, width, height, 0, GL_RGBA, GL_FLOAT, NULL );
}

EDIT: I can have a type of GL_UNSIGNED_INT as well.

Then I create the shared image (Image2DGL):

texMems.push_back(Image2DGL(clw->context, CL_MEM_READ_WRITE, GL_TEXTURE_RECTANGLE, 0, tex, &err));

Then I create the source image (input image):

ImageFormat format;
format.image_channel_data_type = CL_UNORM_INT8;
format.image_channel_order = CL_RGBA;
srcImgBuffer = Image2D(clw->context, CL_MEM_READ_WRITE, format, width, height, 0, NULL, &err);

In every render loop I'm writing the data into the srcImgBuffer:

// write the frame to the image buffer
clw->queue.enqueueWriteImage(srcImgBuffer, CL_TRUE, origin, region, 0, 0, (void*)data, NULL, NULL);

Also I'm setting the arguments for the kernel:

tex_kernel.setArg(0, texMems[0]);
tex_kernel.setArg(1, srcImgBuffer);
tex_kernel.setArg(2, width);
tex_kernel.setArg(3, height);

Before and after I do acquire and release the GL objects. The testing kernel looks like that:

__kernel void init_texture_kernel(__write_only image2d_t out, __read_only image2d_t in, int w, int h)
{
    const sampler_t smp = CLK_NORMALIZED_COORDS_FALSE | CLK_ADDRESS_CLAMP | CLK_FILTER_NEAREST;

    int2 coords = { get_global_id(0), get_global_id(1) };
    float4 pixel = read_imagef(in, smp, coords);
    float4 test = { (float)coords.x/(float)w , 0, 0, 1};
    write_imagef( out, coords, pixel );
}

The image_channel_data_type can be read as float in the kernel and is interpreted as normalized value. The output image looks not right, I've got a sliced picture (linewise), obviously because of the wrong data interpretation. As I mentioned I assume the error is within the initialization of the texture's type. I tried GL_FLOAT (as I'm writing as float to the image in the kernel).

Results: Image as PPM dump from the decoder (left), Texture output scattered (right)

The left one is a PPM out of the decoder, the right one is what I'm getting back on my output texture.

If someone actually read up to here: Do you have suggestions regarding the texture's type to resolve the issue?

EDIT: If I bind the captured frames to the texture directly, the video plays ok. So it must be related to the CL-GL interface somehow.

Solution

I eventually came up with the solution, not seeing the obvious. I had an RGB video. I captured the frames in RGB therefore. This can be displayed on a texture just fine. HOWEVER, if one grabs a float4 from the buffer at once (as seen in the kernel code) where only three channels are present, this is going to mess it up, of course. Argh.

I also could have assumed my mistake by watching the line count: I got 384 out of 512 lines requested lines, which well comes up to a ratio of .75.

I simply demanded my grabber (using libavcodec) to grab RGBA and it would pad it for me to have a four-channel picture.

I will flag this question to be closed.