When loading arrays from global memory to shared memory, variables in shared memory are not doing what I want.
template<class T>
__global__ void kernel(T *t1,T *t2)
{
int tid=threadIdx.x;
extern __shared__ T array1[];
extern __shared__ T array2[];
array1[tid]=t1[tid];//copy (1)
array2[tid]=t2[tid];//copy (2)
__syncthreads();
}
Things turn out that array1[tid]
=array2[tid]
=t2[tid]
.
when exchange the place of copy (1)
and copy (2)
,the result is array1[tid]
=array2[tid]
=t1[tid]
.
Only when I remove extern
the result is what I want (array1[tid]
=t1[tid]
,array2[tid]
=t2[tid]
).
Can anybody explain why?
This:
extern __shared__ T array1[];
extern __shared__ T array2[];
won't work the way you think.
Those pointers (array1
and array2
) will point to the same location.
If you want to have multiple arrays using dynamically allocated shared memory, you must follow the instructions given in the programming guide. Something like this:
extern __shared__ T array1[];
T * array2 = array1 + size_of_array_1;
should work.
And be sure to pass a size allocation in the kernel launch parameters that is sufficient bytes for both the size of array1
and the size of array2