I want to subtract the same array of dimensions 1xN
to every row of a matrix of dimensions MxN
. Namely, perform the same operation numpy
does when doing a-b
, being the former the matrix, and the latter the array. For example:
import pyopencl as cl
import pyopencl.array as cl_array
import numpy as np
a = np.matrix('1 2 3; 4 5 6').astype(np.float32)
b = np.array([1,2,3]).astype(np.float32)
ctx = cl.create_some_context()
queue = cl.CommandQueue(ctx)
a_dev = cl_array.to_device(queue, a)
b_dev = cl_array.to_device(queue, b)
dest_dev = cl_array.empty_like(a_dev)
prg = cl.Program(ctx, """
__kernel void fun(
const unsigned int size,
__global const float *a,
__global const float *b,
__global float *c)
{
int i = get_global_id(0);
int j = get_global_id(1);
c[i + j*size] = a[i + j*size] - b[j];
}
""").build()
prg.fun(queue, a.shape, None, np.int32(len(a)), a_dev.data, b_dev.data, dest_dev.data)
print(dest_dev)
I thought that this kernel would return the matrix
[0 0 0 ; 3 3 3]
but in fact it returns
[0 1 1 ; 2 2 3]
Where is the mistake?
You have to swap the rows und columns of the matrixcomputation in the kernel. You start the kernel with the dimensions that
a.shape # returns tuple(rows, columns)
returns. It returns (2, 3)
instead of (3, 2)
. So change the line in the kernel, where you do the computation of the matrix to :
c[j + i*size] = a[j + i*size] - b[j];
In this case i
are the number of rows, and j
are the columns.
There is also a mistake when you give the size
of the matrix to the kernel. Instead of
np.int32(len(a))
whitch returns 2 the number of rows (y-axis). Change the line with the kernel launch to :
prg.fun(queue, a.shape, None, np.int32(a.shape[1]), a_dev.data, b_dev.data, dest_dev.data)
a.shape[1]
gives you the size of the second dimension of the matrix, which are the number of columns, in this example 3.
There is also the possibility to query the number of columns in the kernel itself with a Work-Item Built-In function:
unsigned int size = get_global_size(1); // size of the 2. dim of the kernel
In this case you don't have to pass the number of columns as a kernel argument.