I am currently writing a matrix multiplication on a GPU and would like to debug my code, but since I can not use printf inside a device function, is there something else I can do to see what is going on inside that function. This my current function:
__global__ void MatrixMulKernel(Matrix Ad, Matrix Bd, Matrix Xd){
int tx = threadIdx.x;
int ty = threadIdx.y;
int bx = blockIdx.x;
int by = blockIdx.y;
float sum = 0;
for( int k = 0; k < Ad.width ; ++k){
float Melement = Ad.elements[ty * Ad.width + k];
float Nelement = Bd.elements[k * Bd.width + tx];
sum += Melement * Nelement;
}
Xd.elements[ty * Xd.width + tx] = sum;
}
I would love to know if Ad and Bd is what I think it is, and see if that function is actually being called.
CUDA now supports printf
s directly in the kernel.
NVIDIA's docs online, Formatted Output section.
Formatted output is only supported by devices of compute capability 2.x and higher.
int printf(const char *format[, arg, ...]);
For past versions' docs, see this page.