Search code examples
openclgpuintelgpgpuself-reference

Self Referencing Pointer in OpenCL


I have an OpenCL C++ code working on the Intel Platform. I do have an idea that pointers are not accepted within a structure on the Kernel End. However, I have a Class which utilizes the Self-Referencing Pointer option within it. Now, I am able to use a structure and replicate the same for the structure on the host side but I am not able to do the same on the device side.

For example as follows:

 Class Classname{  
    Classname *SameClass_Selfreferencingpointer;  
    }  
On the Host side I have done the same for the structure as well:  
    struct Structurename{  
    Structurename *SameStructure_Selfreferencingpointer;  
    }  

Could someone give an alternate option for this implementation for the device side?
Thank you for any help in advance.


Solution

  • Since there isn't malloc in opencl device and also structs are used in buffers as an array of structs, you could add index of it so it knows where it remains in the array. You can allocate a big buffer prior to kernel, then use atomic functions to increment fake malloc pointer as if it is allocating from the buffer but simply returning an integer that points to last "allocated" struct index. Then, host side would just use the index instead of pointer.

    If struct alignments become an issue between host an device, you can add indexing of fields too. Such as starting byte of a field A, starting byte of a field B, all compacted in a single 4-byte integer for a struct having 4 used fields except indexes.

    Maybe you can add a preprocess stage:

    • host writes an artificial number to a field such as 3.1415
    • device checks floating points in struct for all byte offsets until it finds 3.1415
    • device puts the found byte offset to an array and sends it to host
    • then host writes float fields in a struct starting from that byte offset
    • so host and device become alignment compatible, uses same offset in all kernels that get a struct from host

    maybe opposite is better

    • device puts 3.14 in a field of struct
    • device writes the struct to an array of struct
    • host gets the buffer
    • host checks for 3.14 and finds byte offset
    • host writes and fp number starting from that offset for future work

    which would need both your class and its replicated struct on host+device side.

    You should also look for "sycl api".