Search code examples
cudagpgpugpupycuda

Addition Assignment Operator in Cuda C


I'm experiencing a problem with addition assignment operator in Cuda C. I'm getting the following error:

kernel.cu(5): error: expression must have integral or enum type

My code is :

import pycuda.driver as drv
import pycuda.autoinit
from pycuda.compiler import SourceModule
import numpy as np

mod=SourceModule("""
__global__ void addition(float* a,float* b,float*c){
int i=threadIdx.x + blockIdx.x * blockDim.x;
c[a[i]]+=b[i];
}
""")

addition=mod.get_function("addition")
a=np.array([1,2,3,1,2,3,2,1]).astype(np.float32)
b=np.array([0.1,0.2,0.1,0.5,0.1,0.2,0.1,0.5]).astype(np.float32)
c=np.zeros_like(a)
addition(drv.Out(c),drv.In(a),drv.In(b),block=(32,1,1))
print c

My desired output is c = [0,1.1,0.4,0.3,0,0,0,0]. Can anyone suggest the solution?


Solution

  • the problem is in your kernel where you index in C using A.
    A is of type float.

    Also notice that you are launching 32 threads but you will only index in 8 positions which means that you will index out of bounds.

    The last problem you will face is that several threads try to change the same position in C due to duplicated indices in a. One way to fix it is to use AtomicAdd.

    __global__ void addition(float* a,float* b,float*c, int n)
    {
    int i=threadIdx.x + blockIdx.x * blockDim.x;
    if(i < n)
    atomicAdd(&c[(int)a[i]],b[i]);
    }

    Launch the kernel the same way but don't forget to pass the n which is the size of the a or b.
    You could also eliminate the n and change the threadblock dimension when you launch the kernel.