I'm having a hard time doing a basic scalar operation with PyOpenCl Basically, what i'me trying to do is, given a float type array, multiply each array element by a scalar float and put the result on a new buffer. This should be easy but for some reason it's not wroking as it should.
This is the code i'm using: (Variables with _h are HOST variables; Variables with _g are device variables)
import numpy as np
import pyopencl as cl
# Device Init
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
MF = cl.mem_flags
# Host Initial Variables
a_h = np.array([1.0, 2.0, 3.0, 4.0])
b_h = np.float32(2.0)
# DEVICE Variable Allocation
a_g = cl.Buffer(ctx, MF.READ_ONLY | MF.COPY_HOST_PTR, hostbuf=a_h)
c_g = cl.Buffer(ctx, MF.WRITE_ONLY, a_h.nbytes)
# DEVICE's Kernel - Multiply each element of the array a_g by the scalar b_g and put the result on the array c_g
source = """
__kernel void mult(float b_g, __global float *a_g, __global float *c_g){
const int gid = get_global_id(0);
c_g[gid] = b_g * a_g[gid];
}
"""
prg = cl.Program(ctx, source).build()
prg.mult(queue, a_h.shape, None, b_h, a_g, c_g)
# Export The Result On The DEVICE Back To The HOST
c_h = np.empty_like(a_h)
cl.enqueue_copy(queue, c_h, c_g)
# Output
print c_h
The expected ouput was
[2.0 4.0 6.0 8.0]
This was the output:
[ 2.56000000e+002 5.12000000e+002 -1.73777009e+308 -1.73777009e+308]
I don't understand why. I've tried reading the PyOpenCL project page but I didn't really understand much of it to be honest. I guess I'm not doing the kernel part correctly or the kernel call part.
I've tried using the kernel as this:
__kernel void mult(__global float *b_g, __global float *a_g, __global float *c_g)
But as expected it didn't work because i didn't create a pointer for b_g nor i know how to create one. The error was:
:2:39: error: parameter may not be qualified with an address space
__kernel void mult(__global float b_g, __global float *a_g, __global float *c_g){
^
My main idea behind this is simple: Since i'm going to use this value b_g as common thing to all the workers, I want to put them on the global memory once so that every worker can acess to it instead of repeating the value for every worker.
I believe this should be really simple but I'm new to parallel computing and have no idea how to fix this.
Thank you.
In numpy all objects contained in the same array have a common data type. This data type is specified in the attribute dtype
of the array. (numpy.array doc, numpy datatypes)
Data in your array a_h
is of data type float64
(64bit float). As your kernel is expecting 32bit floats (float32
) it is interpreting the data of a_h
like this. So it would see 8 32bit floats instead of 4 64bit floats.
To make it work store the objects in a_h
as float32
. Therefore define it like this:
a_h = np.array([[1.0, 2.0, 3.0, 4.0], dtype=np.float32)