Search code examples
openclopencl-c++

How to access different dimension's index in 3d OpenCl kernel?


I am totally new to openCL. I was trying to convert a sequential 3d matrix code to an openCL version of 3d matrix. I have implemented the basic things of openCL but am stuck in the OpenCL kernel. I got all the indexes of the 3d matrix but can't understand how to access indexes for different dimensions. Can anyone help me with that?

You are also welcome to suggest to me a better way to solve this problem. Thanks in advance.

Here is a small portion of my code.

sequential code:

const int depth = 3;
const int row = 4;
const int column = 4;
float A[depth][row][column];

for (int k = 0; k < depth; k++){
    for (int i = 0; i < row; i++){
        for (int j = 0; j < column; j++){
            if (k == 0){
                A[k][i][j] = (float)i / ((float)j + 1.00);
            }else if (k == 1){
                A[k][i][j] = 1.00;
            }else{
                A[k][i][j] = (float)j / ((float)i + 1.00);
            }
        }
    }
 }

OpenCL kernel code:

__kernel void ThreeDimArray(__global float *const output1) {
  const int x = get_global_id(0);
  const int y = get_global_id(1);
  const int z = get_global_id(2);
  const int max_x = get_global_size(0);
  const int max_y = get_global_size(1);
  const int max_z = get_global_size(2);
  
  const int idx = x * max_y * max_z + y * max_z + z;

  output1[idx] = 1.00;
};

sequential code output:

Baseline matrix k = 0
0.00    0.00    0.00    0.00 
1.00    0.50    0.33    0.25 
2.00    1.00    0.67    0.50
3.00    1.50    1.00    0.75

Baseline matrix k = 1
1.00    1.00    1.00    1.00
1.00    1.00    1.00    1.00
1.00    1.00    1.00    1.00
1.00    1.00    1.00    1.00

Baseline matrix k = 2
0.00    1.00    2.00    3.00
0.00    0.50    1.00    1.50
0.00    0.33    0.67    1.00
0.00    0.25    0.50    0.75

Edit: What if we want to update a specific index with other index values. For example:

for (int t = 0; t < 24; t++){
        for (int i = 1; i < row; i++){
            for (int j = 0; j < column; j++){
                A[1][i][j] = A[1][i][j] + (1 / (sqrt(A[0][i + 1][j] + A[2][i - 1][j])));
            }
        }
    }

I tried like this(kernel code):

const int idk0 = 0 * row * column + i * column + j; 
const int idk1 = 1 * row * column + i * column + j; 
const int idk2 = 2 * row * column + i * column + j;

for (int t = 0; t < 24; t++) {
    A[idk1] = A[idk1] + (1 / (sqrt(A[idk0 + 1] + A[idk2 - 1])));
  }

Solution

  • You already have all you need. The finished kernel looks like:

    __kernel void ThreeDimArray(__global float* A) {
      const int k = get_global_id(0);
      const int i = get_global_id(1);
      const int j = get_global_id(2);
      //const int depth  = get_global_size(0); // unused here
      const int row    = get_global_size(1);
      const int column = get_global_size(2);
      
      const int idx = k * row * column + i * column + j; // linear index to access matrix A in 1D form
    
      if(k == 0) {
          A[idx] = (float)i / ((float)j + 1.00f);
      } else if(k == 1) {
          A[idx] = 1.00;
      } else {
          A[idx] = (float)j / ((float)i + 1.00f);
      }
    };
    

    Edit: For best performance and best flexibility regarding matrix size, I recommend using only 1D indexing for the Kernel Range. Also you have branching for writing different values to only one matrix address. You can use the ternary operator (?:) for this.

    __kernel void ThreeDimArray(__global float* A, const int depth, const int row, const int column) {
      const int idx = get_global_id(0); // 1D kernel range is depth*row*column
      const int t=n%(column*row), j=t%column, i=t/column, k=n/(column*row);
      A[idx] = k==0 ? (float)i/((float)j+1.0f) : k==1 ? 1.0f : (float)j/((float)i+1.0f);
    };
    

    Edit 2: To update only a handful of hand-picked values, you have 2 choices:

    1. Do it on the CPU and copy the matrix from/to GPU. This is fine for 1-time initialization.
    2. Write a separate kernel and in the kernel, have a statement like if(k!=0) return; right at the top of the kernel. Then only threads with k!=0 will continue and write something to the matrix. If these special values depend on neighboring matrix values (like A[0][i+1][j]), make sure these neighbors are already initialized (hence a second, separate kernel for initializing special values). Make sure to not write a new value to cell i,j,k in thread i,j,k while neighboring thread i,j+1,k reads this value i,j-1,kfrom its reference point, otherwise it is not clear of thread i,j+1,k reads the old or the updated value since the exact execution order is random (so-called race-condition). This approach is good if you repeatedly need to write these special matrix values and if A is a very large matrix. In this approach you don't need to copy A from CPU to GPU and back at all which can be really slow if A is large.