Search code examples
openclopencl-c

OpenCL kernel function yields different results when running on different devices (CPU vs GPU)


I'm trying to make a sense out of a strange behavior happening in my OpenCL kernel function. I'm basically trying to convert a string that contains hexadecimals, into a string that contains its decimal representation but, for reasons I cannot fathom, running the same kernel using GPU and CPU yield different results.

The kernel looks like the following:

 // yes, there's no result defined for the moment
__kernel void testKernel(__global uint message_length, __global char *message) {

  size_t converted_message_length = message_length / 2;
  char converted_message[converted_message_length];

  // (1) hex to decimal conversion 
  for (size_t idx = 0, j = 0; idx < converted_message_length; idx++, j++) {
    converted_message[idx] = (message[j] & '@' ? message[j] + 9 : message[j]) << 4;
    j++;
    converted_message[idx] |= (message[j] & '@' ? message[j] + 9 : message[j]) & 0xF;
    printf("converted '%c%c' into '%i'\n", message[j - 1], message[j], converted_message[idx]);
  }

  // (2) this should be redundant, since I already print the content...
  // but actually behaves differently with different device (CPU/GPU)
  for (size_t idx = 0, j = 0; idx < converted_message_length; idx++, j++) {
    printf("converted_message[%i]: '%i'\n", idx, converted_message[idx]);
  }

Now, if I pass as arguments of testKernel function the lenght 4, and input string containing hexadecimal value 3e2b, I would expect them to be converted into decimals 62 and 43 (see this table for hex -> decimal conversion).

And, If I run the kernel using my CPU (Intel(R) Core(TM) i9-9880H), indeed I can see the following conversion happening:

converted '3e' into '62'
converted '2b' into '43'
converted_message[0]: '62'
converted_message[1]: '43'

However, if I run this very same kernel using my GPU (AMD Radeon Pro 5500M), I see the following results:

converted '3e' into '62'
converted '2b' into '43'
converted_message[0]: '0'  <-- why it is 0 ???
converted_message[1]: '0'  <-- why it is 0 ???

It seems like converted_message is successfully written inside loop (1), but then its values get lost when I enter inside (2) loop. How's that even possible? Is it the OpenCL performing some weird optimization under the hood, that manifest itself only when running in the GPU ?


Solution

  • char converted_message[converted_message_length];
    

    This is a variable length array which is not supported in standard OpenCL. It might work on some OpenCL implementations as an extension, but is not portable.

    Specify a fixed size for the array or allocate a buffer on the host.