Search code examples
pythonstructopenclmemory-alignmentpyopencl

Passing struct with pointer members to OpenCL kernel using PyOpenCL


Let's suppose I have a kernel to compute the element-wise sum of two arrays. Rather than passing a, b, and c as three parameters, I make them structure members as follows:

typedef struct
{
    __global uint *a;
    __global uint *b;
    __global uint *c;
} SumParameters;

__kernel void compute_sum(__global SumParameters *params)
{
    uint id = get_global_id(0);
    params->c[id] = params->a[id] + params->b[id];
    return;
}

There is information on structures if you RTFM of PyOpenCL [1], and others have addressed this question too [2] [3] [4]. But none of the OpenCL struct examples I've been able to find have pointers as members.

Specifically, I'm worried about whether host/device address spaces match, and whether host/device pointer sizes match. Does anyone know the answer?

[1] http://documen.tician.de/pyopencl/howto.html#how-to-use-struct-types-with-pyopencl

[2] Struct Alignment with PyOpenCL

[3] http://enja.org/2011/03/30/adventures-in-opencl-part-3-constant-memory-structs/

[4] http://acooke.org/cute/Somesimple0.html


Solution

  • No, there is no guaranty that address spaces match. For the basic types (float, int,…) you have alignment requirement (section 6.1.5 of the standard) and you have to use the cl_type name of the OpenCL implementation (when programming in C, pyopencl does the job under the hood I’d say).

    For the pointers it’s even simpler due to this mismatch. The very beginning of section 6.9 of the standard v 1.2 (it’s section 6.8 for version 1.1) states:

    Arguments to kernel functions declared in a program that are pointers must be declared with the __global, __constant or __local qualifier.

    And in the point p.:

    Arguments to kernel functions that are declared to be a struct or union do not allow OpenCL objects to be passed as elements of the struct or union.

    Note also the point d.:

    Variable length arrays and structures with flexible (or unsized) arrays are not supported.

    So, no way to make you kernel runs as described in your question and that's why you haven’t been able to find some examples of OpenCl struct have pointers as members.
    I still can propose a workaround that takes advantage of the fact that the kernel is compiled in JIT. It still requires that you pack you data properly and that you pay attention to the alignment and finally that the size doesn’t change during the execution of the program. I honestly would go for a kernel taking 3 buffers as arguments, but anyhow, there it is.

    The idea is to use the preprocessor option –D as in the following example in python:

    Kernel:

    typedef struct {
        uint a[SIZE];
        uint b[SIZE];
        uint c[SIZE];
    } SumParameters;
    
    kernel void foo(global SumParameters *params){
        int idx = get_global_id(0);
        params->c[idx] = params->a[idx] + params->b[idx];
    }
    

    Host code:

    import numpy as np
    import pyopencl as cl
    
    def bar():
       mf = cl.mem_flags
       ctx = cl.create_some_context()
       queue = cl.CommandQueue(self.ctx)
       prog_f = open('kernels.cl', 'r')
       #a = (1, 2, 3), b = (4, 5, 6)          
       ary = np.array([(1, 2, 3), (4, 5, 6), (0, 0, 0)], dtype='uint32, uint32, uint32')
       cl_ary = cl.Buffer(ctx, mf.READ_WRITE | mf.COPY_HOST_PTR, hostbuf=ary)
       #Here should compute the size, but hardcoded for the example
       size = 3
       #The important part follows using -D option
       prog = cl.Program(ctx, prog_f.read()).build(options="-D SIZE={0}".format(size))    
       prog.foo(queue, (size,), None, cl_ary)
       result = np.zeros_like(ary)
       cl.enqueue_copy(queue, result, cl_ary).wait()
       print result
    

    And the result:

    [(1L, 2L, 3L) (4L, 5L, 6L) (5L, 7L, 9L)]