I would like to understand why the following PyCUDA code doesn't work.
The error that I have is:
TypeError: invalid type on parameter #3 (0-based)
And the error occurs on the block line of my call of the function. In the code it is at the line block = (MATRIX_SIZE,MATRIX_SIZE,1)
, 2 lines before the end.
Does anyone know what is the mistake here? I tried a lot of things but I can't figure out.
The CUDA code is working in C++, I am just trying to translate it in PyCUDA now and it is where it fails.
import numpy as np
from pycuda import driver, compiler, gpuarray, tools
# -- initialize the device
import pycuda.autoinit
kernel_code_template = """
__global__ void MatMult(float* C, float* A, float*B, int dimAx, int dimBx, int dimCx, int dimCy)
{
int row = blockDim.y*blockIdx.y+threadIdx.y;
int col = blockDim.x*blockIdx.x+threadIdx.x;
double Result = 0;
if (row<=dimCy-1 && col<=dimCx-1)
{
for (int k = 0; k < dimAx; k++)
{
Result += A[k + dimAx*row] * B[col + dimBx*k];
}
C[col + row*dimCx] = Result;
}
}
"""
MATRIX_SIZE=3
# I create my variables :
a_cpu=np.asarray([[0,1,2],[10,11,12],[20,21,22]])
b_cpu=np.asarray([[0,0,0],[1,2,3],[4,8,12]])
a_gpu = gpuarray.to_gpu(a_cpu)
b_gpu = gpuarray.to_gpu(b_cpu)
size_Ax=a_cpu.shape[1]
size_Bx=b_cpu.shape[1]
size_Ay=a_cpu.shape[0]
size_Cx=size_Bx # Cx=Bx because of matrix product
size_Cy=size_Ay # Cy=Ay
# create empty gpu array for the result (C = A * B)
c_gpu = gpuarray.empty((size_Cy, size_Cx), np.float32)
# get the kernel code from the template
kernel_code=kernel_code_template
# compile the kernel code
mod = compiler.SourceModule(kernel_code)
# get the kernel function from the compiled module
matrixmul = mod.get_function("MatMult")
# call the kernel on the card
matrixmul(
# outputs
c_gpu,
# inputs
a_gpu, b_gpu,
size_Ax,size_Bx,size_Cx,size_Cy,
# (only one) block of MATRIX_SIZE x MATRIX_SIZE threads
block = (MATRIX_SIZE,MATRIX_SIZE,1),
)
Your interpretation of the source of the error is incorrect. The error message:
"TypeError: invalid type on parameter #3 (0-based)"
is telling you that the fourth parameter size_Ax
has an incorrect type. The error is not with the block argument.
The reason for this is that PyCUDA enforces strict type safety when passing data to and from the GPU. Your kernel signature requires int
values for dimAx
, dimBx
, dimCx
, and dimCy
,which are 32 bit. Python integers are 64 bit by default. You need to explicitly cast the arguments to the correct ctype
, something like:
matrixmul(
# outputs
c_gpu,
# inputs
a_gpu, b_gpu,
np.int32(size_Ax),np.int32(size_Bx),np.int32(size_Cx),np.in32(size_Cy),
# (only one) block of MATRIX_SIZE x MATRIX_SIZE threads
block = (MATRIX_SIZE,MATRIX_SIZE,1),
)
should work correctly.