PyOpenCL kernel not being applied to entire array

I wanted to get a feel for Elementwise demo that comes with PyOpenCL and decided to try this out:

from __future__ import absolute_import
from __future__ import print_function
import pyopencl as cl
import pyopencl.array as cl_array
import numpy
from pyopencl.elementwise import ElementwiseKernel

ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)

n = 6

a_gpu = cl.array.to_device(queue,
numpy.arange(1, n, dtype=int))

update_a = ElementwiseKernel(ctx,
"int *a",
"a[i] = 2*a[i]",
"update_a")

print(a_gpu.get())
update_a(a_gpu)
print(a_gpu.get())

Which I expected to print out

[1 2 3 4 5]
[2 4 6 8 10]

but I'm instead getting

[1 2 3 4 5]
[2 4 6 4 5] .

Furthermore, when I try to store the "i" value into the array to see what's going on, I get some really weird values. They are all over the place and some are even negative.

I have been trying to make sense of this for a while now but can't. Can somebody please explain why this is happening? thanks.

Related info: PyOpenCL Version: 2018.2.1, Python Version: 3.6.5, OS: macOS 10.14.1

Solution

Your bug lies in the vagueness of the typing of the numpy array, which has led to inconsistent strides along elements of the array on the CPU vs CL-device sides

Specifying dtype=int is ambiguous, and assumes 8-byte np.int64 or long elements. The matching type on the CL-device side should be long *a_in for np.int64.

If you want to stick with 4-byte integers, specify dtype=np.int32 on the CPU side and int *a_in on the CL-device side.

Takeaway: Always specify your numpy array types with clarity, e.g., dtype=np.int64. And check for a precise match on the CL-device side.