I am a beginner in Numba. I have difficulty in re-arranging the rows of an array in GPU.
In Numba CPU, for example, this can be done by
from numba import njit
import numpy as np
@njit
def numba_cpu(A, B, ind):
for i, t in enumerate(ind):
B[i, :] = A[t, :]
ind = np.array([3, 2, 0, 1, 4])
A = np.random.rand(5, 3)
B = np.zeros((5, 3))
numba_cpu(A, B, ind)
But it does not work with cuda.jit
from numba import cuda
import numpy as np
@cuda.jit
def numba_gpu(A, B, ind):
for i, t in enumerate(ind):
B[i, :] = A[t, :]
d_ind = cuda.to_device(np.array([3, 2, 0, 1, 4]))
d_A = cuda.to_device(np.random.rand((5, 3)))
d_B = cuda.to_device(np.zeros((5, 3)))
numba_gpu[16,16](d_A, d_B, d_ind)
The program fails with a lot of exceptions, and it says "NRT required but not enabled".
Of course I can use a nested loop to copy entry by entry, but it looks bad because I know the a row is in consecutive memory. Even a C-language-like memcpy
would be better. But it seems Numba does not support memcpy
.
I think I have found a solution myself. What I need is to manipulate Numpy arrays in CUDA device. For this purpose, CuPy is much better than Numba. CuPy supports many Numpy-like operations (including the one in my question) in an efficient and convenient way.