Search code examples
pythonnumpycudaanacondanumba-pro

cuda code error within numbapro


import numpy
import numpy as np
from numbapro import cuda


@cuda.autojit
def foo(aryA, aryB,out):
    d_ary1 = cuda.to_device(aryA)
    d_ary2 = cuda.to_device(aryB)
    #dd = numpy.empty(10, dtype=np.int32)
    d_ary1.copy_to_host(out)


griddim = 1, 2
blockdim = 3, 4
aryA = numpy.arange(10, dtype=np.int32)
aryB = numpy.arange(10, dtype=np.int32)
out = numpy.empty(10, dtype=np.int32)

foo[griddim, blockdim](aryA, aryB,out)

Exception: Caused by input line 11: can only get attribute from globals, complex numbers or arrays

I am new to numbapro, hints are needed!


Solution

  • The @cuda.autotjit marks and compiles foo() as a CUDA kernel. The memory transfer operations should be placed outside of the kernel. It should look like the following code:

    import numpy
    from numbapro import cuda
    
    @cuda.autojit
    def foo(aryA, aryB ,out):
        # do something here
        i = cuda.threadIdx.x + cuda.blockIdx.x * cuda.blockDim.x
        out[i] = aryA[i] + aryB[i]
    
    griddim = 1, 2
    blockdim = 3, 4
    aryA = numpy.arange(10, dtype=numpy.int32)
    aryB = numpy.arange(10, dtype=numpy.int32)
    out = numpy.empty(10, dtype=numpy.int32)
    
    # transfer memory
    d_ary1 = cuda.to_device(aryA)
    d_ary2 = cuda.to_device(aryB)
    d_out = cuda.device_array_like(aryA) # like numpy.empty_like() but for GPU
    # launch kernel
    foo[griddim, blockdim](aryA, aryB, d_out)
    
    # transfer memory device to host
    d_out.copy_to_host(out)
    
    print out
    

    I recommend new NumbaPro users to look at the examples in https://github.com/ContinuumIO/numbapro-examples.