Search code examples
c++arraysparallel-processingopenclmanhattan

OpenCL 1D range loop without knowledge of global size


I was wondering how can I iterate over a loop with a any number of work items (per group is irrelevant)

I have 3 arrays and one of them is 2-dimensional(a matrix). The first array contains a set of integers. The matrix is filled with another set of (repeated and random) integers. The third one is only to store the results. I need to search for the farest pair's numbers of occurrences of a number, from the first array, in the matrix. To summarize:

  • A: Matrix with random numbers
  • num: Array with numbers to search in A
  • d: Array with maximum distances of pairs of each number from num

The algorithm is simple(as I don't need to optimize it), I only compare calculated Manhattan distances and keep the maximum value.

To keep it simple, it does the following (C-like pseudo code):

for(number in num){
  maxDistance = 0
  for(row in A){
    for(column in A){
      //calculateDistance is a function to another nested loop like this
      //it returns the max found distance if it is, and 0 otherwise
      currentDistance = calculateDistance(row, column, max)
      if(currentDistance > maxDistance){
        maxDistance = currentDistance
      }
    }
  }
}

As you can see there is no dependent data between iterations. I tried to assign each work item a slice of the matrix A, but still doesn't convince me.

IMPORTANT: The kernel must be executed with only one dimension for the problem.

Any ideas? How can I use the global id to make multiple search at once?

Edit:

I added the code to clear away any doubt. Here is the kernel:

__kernel void maxDistances(int N, __constant int *A, int n, __constant int *numbers, __global int *distances)
{
    
    //N is matrix row and col size
    //A the matrix
    //n the total count of numbers to be searched
    //numbers is the array containing the numbers
    //distances is the array containing the computed distances
    

    size_t id = get_global_id(0);   
    

    int slice = (N*N)/get_global_size(0);

    for(int idx_num = 0; idx_num < n; idx_num++)
    {
        int number = numbers[idx_num];
        int currentDistance = 0;
        int maxDistance = 0;


        for(int c = id*slice; c < (id+1)*slice; c++)
        {
            int i = c/N;
            int j = c%N;
            if(*CELL(A,N,i,j) == number){
                coord_t coords;
                coords.i = i;
                coords.j = j;

                //bestDistance is a function with 2 nested loop iterating over
                //rows and column to retrieve the farest pair of the number
                currentDistance = bestDistance(N,A,coords,number, maxDistance);
                if(currentDistance > maxDistance)
                {
                    maxDistance = currentDistance;
                }
            }
        }

        distances[idx_num] = maxDistance;
    }
}

Solution

  • This answer may be seen as incomplete, nevertheless, I am going to post it in order to close the question.

    My problem was not the code, the kernel (or that algorithm), it was the machine. The above code is correct and works perfectly. After I tried my program in another machine it executed and computed the solution with no problem at all.

    So, in brief, the problem was the OpenCL device or most likely the host libraries.