I have the following CUDAKernel object:
Which I invoke using:
kernel1 = parallel.gpu.CUDAKernel('kcc2.ptx', 'kcc2.cu');
kernel1.ThreadBlockSize = 256;
kernel1.GridSize = 4;
gpuTM = gpuArray(single(TM));
gpuLTM = gpuArray(single(LTM));
gpuLTMP = gpuArray(int32(LTMP));
rng('shuffle');
randz = abs(randi(2^53 -1, [1, r_max]));
GPUrands = gpuArray(double(randz));
[x,y] = gather(feval(kernel1, gpuLTM, gpuLTMP, F_M, Force, GPUrands, ...
(r_max), single(Lamda), single(Fixed_dt), single(r), single(q), ...
single(gama_B), single(gama_M), single(mu_B), single(mu_M), ...
single(KB_p_ref), single(KB_m_ref), single(f_ref), single(g_ref), ...
single(Kca_p_ref), single(Kca_m_ref)));
As you see above, I have 2 left hand arguments yet I get the error in MATLAB:
Error using gpuArray/gather: Too many output arguments.
I don't get it. All my parameters line up in the CUDA kernel and in MATLAB. Just so you can see, the kernel function has the following C++ prototype:
__global__ void myKern(const float *transMatrix, const int *pointerMatrix,
float *masterForces, float *Force, const double *rands, const int r_max,
const float lamda, const float dt, const float r, const float q,
const float gama_B, const float gama_M, const float mu_B, const float mu_M,
const float KB_p_ref, const float KB_m_ref, const float f_ref,
const float g_ref, const float Kca_p_ref, const float Kca_m_ref)
It should only return masterForces
and Force
([x,y]
in MATLAB) since they are the only non-constant pointers.
What could be the problem?
You can't apply gather
directly on multiple output variables, you have to do that in separate lines (this is basic MATLAB syntax):
[x,y] = feval(kernel1, ...);
x = gather(x);
y = gather(y);
The output of evaluating the CUDA kernel is two variables of type gpuArray
(data stored on the GPU). You can then transfer the data to CPU memory using gather
applied on each variable.