Search code examples
castingkernelopencl

Casting an integer to a float in OpenCL


This is my first post on stack overflow so bear with me.

I am currently programming an OpenCL Kernel and require the use of the inbuilt sqrt function. However, for this to work the parameter of the function needs to be a float. I currently have an integer value and need to convert it to float, in order to perform the sqrt() function and then convert it back to an integer so that it can be stored into the "magOut" array.

The Code below should provide a better understanding of what I am trying to do:

magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);

In case it is needed to understand the required application here is the full code:

__kernel void matrixMultiplication(__global int* input, __global int* xConv, __global int* yConv, __global int* size, __global int* magOutput){

int workItemNum = get_global_id(0); //Work item ID
int workGroupNum = get_group_id(0); //Work group ID
int localGroupID = get_local_id(0); //Work items ID within each work group

// size refers to the total size of a matrix. So for a 3x3 size = 9
float dim = *size;
int dim1 = *size; 

int row = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the row length
int current_row = workItemNum/dim; // the current row is calculated by using the current workitem number divided by the total size of the matrix

int col = sqrt(dim); // only square matrices are used and as such the sqrt of size produces the column length
int current_col = workItemNum % dim1; // the current column is calculated by using the current workitem number modulus by the total size of the matrix

// printf("dimension: %i \n",localGroupID);

// This if statement excludes all boundary pixels from the calculation as you require the neighbouring pixel cells 
// for this calculation
if (current_col == 0 || current_col == col-1 || current_row == 0 || current_row == row - 1){
    /*===============================================================================================================
    * The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
    *
    *                            [-1  0 +1]
    * X - Directional Kernel  =  [-2  0 +2]
    *                            [-1  0 +1]
    * 
    * This scans across the X direction of the image and enhances all edges in the X-direction 
    * ===============================================================================================================
    */
    xConv[workItemNum] =  input[(current_col - 1)*col + current_row - 1]*-1 
             + input[(current_col)*col + current_row - 1]*0 
             + input[(current_col + 1)*col + current_row - 1]*1 
             + input[(current_col - 1)*col + current_row]*-2 
             + input[(current_col)*col + current_row]*0 
             + input[(current_col + 1)*col + current_row]*2 
             + input[(current_col - 1)*col + current_row + 1]*-1 
             + input[(current_col)*col + current_row + 1]*0 
             + input[(current_col + 1)*col + current_row + 1]*1;

    /*===============================================================================================================
    * The xConv array performs the kernal convultion of the input grey scale values with the following matrix:
    *
    *                            [+1 +2 +1]
    * Y - Directional Kernel  =  [ 0  0  0]
    *                            [-1 -2 -1]
    * 
    * This scans across the Y direction of the image and enhances all edges in the Y-direction 
    * ===============================================================================================================
    */
    yConv[workItemNum] = input[(current_col - 1)*col + current_row - 1]*-1 
             + input[(current_col)*col + current_row - 1]*-2 
             + input[(current_col + 1)*col + current_row - 1]*-1 
             + input[(current_col - 1)*col + current_row]*0 
             + input[(current_col)*col + current_row]*0 
             + input[(current_col + 1)*col + current_row]*0 
             + input[(current_col - 1)*col + current_row + 1]*1 
             + input[(current_col)*col + current_row + 1]*2 
             + input[(current_col + 1)*col + current_row + 1]*1;
}

//===============================================================================================================
// Calculates the convolution matrix of the X and Y arrays. Does so by squaring each item of the X and Y arrays,  
// adding them and taking the square root. This is the basic magnitude formula. This is done for by each workItem
//===============================================================================================================
magOutput[workItemNum] = sqrt(xConv[workItemNum]*xConv[workItemNum] + yConv[workItemNum]*yConv[workItemNum]);
}

Any suggestions?


Solution

  • So you essentially have a 2D vector of int data type and want to calculate its length. Most of OpenCL C is just standard C99 code/syntax, so the most straightforward way would be to use standard C-style type casting:

    magOutput[workItemNum] = (int)(sqrt((float)xConv[workItemNum]*(float)xConv[workItemNum] + (float)yConv[workItemNum]*(float)yConv[workItemNum])+0.5f);
    

    The +0.5f is for correct rounding: casting a float to int always rounds down, for example (int)3.9f would be converted to 3. By adding the +0.5f immediately before casting, the result is rounded up/down correctly. Note that I first cast to float and then do the squaring; otherwise there could be integer overflow during the multiplication.


    A possibly faster approach would be this: Here I load the values xConv[workItemNum]/yConv[workItemNum] from global memory only once (this is really slow), cast them to float and store them in private memory (registers) xConvf/yConvf. Then I do the length calculation and rounding and then I write the result back to magOutput[workItemNum] in slow global memory.

    const float xConvf = (float)xConv[workItemNum], yConvf = (float)yConv[workItemNum];
    magOutput[workItemNum] = (int)(sqrt(xConvf*xConvf + yConvf*yConvf)+0.5f);
    

    If you want to get really fancy with the built-in math functionality in OpenCL C, you can also do this (should be exactly as fast as the 2nd approach):

    magOutput[workItemNum] = (int)(length(float2((float)xConv[workItemNum], (float)yConv[workItemNum]))+0.5f);
    

    So there is 2 takeaways for you:

    • Type casting in OpenCL works just like in C: (float)x.
    • Reading from / writing to global memory (your kernel parameter arrays) is really slow. Only load the necessary values once in local variables (these are private memory space) to have the numbers in registers, then do the arithmetic in registers and then write back to global memory once.

    While this leaves your OpenCL C code untouched, I recommend this lightweight OpenCL-Wrapper for development with C++. This reduces the OpenCL control logic in your CPU code to about 1/4 and makes development much easier.