Pycuda code not working : the "block" line in the call of the function doesn't work

I would like to understand why the following PyCUDA code doesn't work.

The error that I have is:

TypeError: invalid type on parameter #3 (0-based)

And the error occurs on the block line of my call of the function. In the code it is at the line block = (MATRIX_SIZE,MATRIX_SIZE,1), 2 lines before the end.

Does anyone know what is the mistake here? I tried a lot of things but I can't figure out.

The CUDA code is working in C++, I am just trying to translate it in PyCUDA now and it is where it fails.

import numpy as np
from pycuda import driver, compiler, gpuarray, tools

# -- initialize the device
import pycuda.autoinit

kernel_code_template = """
__global__  void MatMult(float* C, float* A, float*B, int dimAx, int dimBx, int dimCx, int dimCy)
{
    int row = blockDim.y*blockIdx.y+threadIdx.y;
    int col = blockDim.x*blockIdx.x+threadIdx.x;

    double Result = 0;

    if (row<=dimCy-1 && col<=dimCx-1)
    {
        for (int k = 0; k < dimAx; k++)
        {
            Result += A[k + dimAx*row] * B[col + dimBx*k];
        }

        C[col + row*dimCx] = Result;
    }
}
"""

MATRIX_SIZE=3

# I create my variables :
a_cpu=np.asarray([[0,1,2],[10,11,12],[20,21,22]])
b_cpu=np.asarray([[0,0,0],[1,2,3],[4,8,12]])

a_gpu = gpuarray.to_gpu(a_cpu) 
b_gpu = gpuarray.to_gpu(b_cpu)

size_Ax=a_cpu.shape[1]
size_Bx=b_cpu.shape[1]

size_Ay=a_cpu.shape[0]

size_Cx=size_Bx # Cx=Bx because of matrix product
size_Cy=size_Ay # Cy=Ay
# create empty gpu array for the result (C = A * B)
c_gpu = gpuarray.empty((size_Cy, size_Cx), np.float32)

# get the kernel code from the template 
kernel_code=kernel_code_template
# compile the kernel code 
mod = compiler.SourceModule(kernel_code)

# get the kernel function from the compiled module
matrixmul = mod.get_function("MatMult")

# call the kernel on the card

matrixmul(
    # outputs
    c_gpu, 
    # inputs
    a_gpu, b_gpu,
    size_Ax,size_Bx,size_Cx,size_Cy,
    # (only one) block of MATRIX_SIZE x MATRIX_SIZE threads
    block = (MATRIX_SIZE,MATRIX_SIZE,1),
    )

Solution

Your interpretation of the source of the error is incorrect. The error message:

"TypeError: invalid type on parameter #3 (0-based)"

is telling you that the fourth parameter size_Ax has an incorrect type. The error is not with the block argument.

The reason for this is that PyCUDA enforces strict type safety when passing data to and from the GPU. Your kernel signature requires int values for dimAx, dimBx, dimCx, and dimCy,which are 32 bit. Python integers are 64 bit by default. You need to explicitly cast the arguments to the correct ctype, something like:

matrixmul(
    # outputs
    c_gpu, 
    # inputs
    a_gpu, b_gpu,
    np.int32(size_Ax),np.int32(size_Bx),np.int32(size_Cx),np.in32(size_Cy),
    # (only one) block of MATRIX_SIZE x MATRIX_SIZE threads
    block = (MATRIX_SIZE,MATRIX_SIZE,1),
    )

should work correctly.