I'm trying to build an equality checker for two arrays that can I can run on my GPU using PyCUDA.
Following the example given on the PyCUDA GPU Arrays documentation page, I attempted to write my own implementation. But whilst the below code works as expected for arithmetic, e.g. "z[i] = x[i] + y[i]"
, it returns erroneous output for the equality checker operand "z[i] = x[i] == y[i]"
.
import pycuda.gpuarray as gpuarray
import pycuda.driver as cuda
import pycuda.autoinit
import numpy as np
from pycuda.elementwise import ElementwiseKernel
matrix_size = (5,)
a = np.random.randint(2, size=matrix_size)
b = np.random.randint(2, size=matrix_size)
print a
print b
a_gpu = gpuarray.to_gpu(a)
b_gpu = gpuarray.to_gpu(b)
eq_checker = ElementwiseKernel(
"int *x, int *y, int *z",
"z[i] = x[i] == y[i]",
"equality_checker")
c_gpu = gpuarray.empty_like(a_gpu)
eq_checker(a_gpu, b_gpu, c_gpu)
print c_gpu
Which prints out something like:
[0 1 0 0 0]
[0 1 1 1 0]
[4294967297 4294967297 0 1 1]
Does anyone understand why this error is occurring, or at least have an alternative PyCUDA method to achieve the desired function?
Solved! The problem was that numpy automatically returns 64-bit integers, whereas PyCUDA only standardly accepts 32-bit integers.
This is therefore fixed by specifying the type of ints numpy generates, such as:
a = np.random.randint(2, size=matrix_size, dtype=np.int32)
b = np.random.randint(2, size=matrix_size, dtype=np.int32)
after which it works as expected.