Search code examples
openclgpgpuhough-transform

Hough transform and OpenCL


I'm trying to implement Hough transform for circles in OpenCL, but i've encountered really weird problem. Every time i run the Hough kernel, i end up with slightly different accumulator, even though parameters are the same and accumulator is always a freshly zero'ed table (ex. https://i.sstatic.net/utA15.jpg). My kernel code is as below:

#define BLOCK_LEN 256

__kernel void HoughCirclesKernel(
    __global int* A,
    __global int* imgData,
    __global int* _width,
    __global int* _height,
    __global int* r
)
{
    __local int imgBuff[BLOCK_LEN];

    int localThreadIndex = get_local_id(0); //threadIdx.x
    int globalThreadIndex = get_local_id(0) + get_group_id(0) * BLOCK_LEN; //threadIdx.x + blockIdx.x * Block_Len
    int width = *_width; int height = *_height;
    int radius = *r;

    A[globalThreadIndex] = 0;
    barrier(CLK_GLOBAL_MEM_FENCE);

    if(globalThreadIndex < width*height)
    {
        imgBuff[localThreadIndex] = imgData[globalThreadIndex]; 
        barrier(CLK_LOCAL_MEM_FENCE);

        if(imgBuff[localThreadIndex] > 0) 
        {
            float s1, c1;
            for(int i = 0; i<180; i++)
            {
                s1 = sincos(i, &c1);
                int centerX = globalThreadIndex % width + radius * c1;
                int centerY = ((globalThreadIndex - centerX) / height) + radius * s1;

                if(centerX < width && centerY < height)
                    atomic_inc(A + centerX + centerY * width);
            }
        }
    }
    barrier(CLK_GLOBAL_MEM_FENCE);
}

Could this be the fault of how I am incrementing the accumulator?


Solution

  • I have managed to solve my problem by finding and correcting three issues.

    First of all the kernel code, the line:

    int centerY = ((globalThreadIndex - centerX) / height) + radius * s1;
    

    should be:

    int centerY = (globalThreadIndex / width) + radius * s1;
    

    The main change here was dividing by width, not height. This caused inaccuracy problems.

    if(centerX < width && centerY < height)
    

    The above condition was changed to:

    if(x < width && x >= 0)
        if(y < height && y >=0)
    

    As for the accumulator problem, first I will post the code I used to create clBuffer (I am using OpenCL.net library for C#):

    int[] a = new int[width*height]; //image size
    ErrorCode error;
    Mem cl_accumulator = (Mem)Cl.CreateBuffer(cl_context, MemFlags.ReadWrite, (IntPtr)(a.Length * sizeof(int)), out error);
    CheckErr(error, "Cl.CreateBuffer");
    

    The fix here was simple and pretty much self-explainatory:

    int[] a = Enumerable.Repeat(0, width * height).ToArray();
    ErrorCode error;
    GCHandle accHandle = GCHandle.Alloc(a, GCHandleType.Pinned);
    IntPtr accPtr = accHandle.AddrOfPinnedObject();
    Mem cl_accumulator = (Mem)Cl.CreateBuffer(cl_context, MemFlags.ReadWrite | MemFlags.CopyHostPtr, (IntPtr)(a.Length * sizeof(int)), accPtr, out error);
    CheckErr(error, "Cl.CreateBuffer");
    

    I filled the accumulator table with zeros and then copied it to device buffer each time I executed the kernel.

    The above errors caused the accumulator to look different and bit malformed each time I executed the kernel.