I'm using cublasDgemm to multiply two matrices.
I wrote a method that uses cublasDgemm and returns the pointer to the output.
It seems to work well in my unit tests but it fails in my application code (return code CUBLAS_STATUS_EXECUTION_FAILED).
I went over the code many times now and everything seem ok.. is there anyway to get a better error explanation?
Update: It seems like every 2nd cublasDgemm call works. The first one I'm getting this error, the second one I get success.. any ideas?
Update2: This is my call
const double alpha = 1.0;
const double beta = 0;
cublasStatus_t ret = cublasDgemm(RmCudaMatrix::handle_, CUBLAS_OP_N, CUBLAS_OP_N,
Rows(), b.Cols(), Cols(), &alpha,
device_matrix_, Rows(), b.device_matrix_, b.Rows(), &beta,
output->device_matrix_, output->Rows());
Thanks.
The CUBLAS functions may run asynchronously so, when a CUBLAS call returns a cublasStatus_t
other that CUBLAS_STATUS_SUCCESS
, the error may be in a previous call. To determine if this is the case, check the CUDA error status after each CUBLAS call with cudaGetLastError()
.