I use pyOpenCl 2013.1 and my code crashes on a nVidia GPU, AMD CPU and AMD GPU but works on a Intel CPU.
Using the nvidia GPU, the error raises on queue.finish after calling the kernel.
LogicError: clFinish failed: invalid command queue
I located the cause at line 48 in the following snippet.
1: typedef struct {
2: int global_index;
3: int local_index;
4: float speed_limit;
5: float width;
6: } segment_t;
7:
8: typedef struct {
9: int item_count;
10: segment_t first_item;
11: } segment_list_t;
12:
13: void explode_segment_list_t(segment_list_t* list, segment_t** array)
14: {
15: array[0] = &(list->first_item);
16: }
17:
18:
19:
20: /*
21: * ro_data is read-only array of 3316 byte (829 int)
22: * wo_data is write-only array of 3316 byte (829 int)
23: */
24: __kernel void test_kernel(global int* ro_data, global int* wo_data)
25: {
26: unsigned int i = get_global_id(0);
27:
28: // copy uncasted, primitive types
29: for(int index = 0; index < ro_data[0]; index++)
30: wo_data[index] = ro_data[index]; // this works
31:
32: // access casted local struct
33: int temp[829] = {0};
34: segment_list_t* casted_temp_list = (segment_list_t*)temp;
35: casted_temp_list->item_count = 1337; // this works
36: // do more tests
37: segment_t* casted_temp_array;
38: explode_segment_list_t(casted_temp_list, &casted_temp_array);
39: casted_temp_array[1].global_index = 1;
40: casted_temp_array[2].global_index = 2; // this works
41:
42: // copy local data to global data
43: for(int index = 0; index < ro_data[0]; index++)
44: wo_data[index] = temp[index]; // this works
45:
46: // access casted global memory
47: segment_list_t* casted_wo_data = (segment_list_t*)wo_data;
48: casted_wo_data->item_count = 42; // this fails on GPU but works on CPU
49:
50: }
Executable pyopencl.capture_call here
An ugly memory wasting fix would be: allocating an local array, copy the data and then cast it. But i'm sure, i did something wrong here ... but what?
Thanks for your help!
EDIT: On AMD devices (CPU and GPU) it failes with a more informative message:
*error: invalid type conversion
segment_list_t* casted_wo_data = (segment_list_t*)wo_data;*
The error message from the AMD SDK helped to find the cause.
As described in a forum posting, the casted type must have the same memory location annotation as the source variable.
In this example:
segment_list_t* casted_wo_data = (segment_list_t*)wo_data;
have to be replaced with:
global segment_list_t* casted_wo_data = (global segment_list_t*)wo_data;