According to CUBLAS reference, asum function (for getting the sum of the elements of a vector) is:
cublasStatus_t cublasSasum(cublasHandle_t handle, int n, const float *x, int incx, float *result)
You can see in the link to the reference the parameters explanation, roughly we have a vector x
of n
elements with incx
distance between elements.
My code is (quite simplified, but I also tested this one and there is still the error):
int arraySize = 10;
float* a = (float*) malloc (sizeof(float) * arraySize);
float* d_a;
cudaMalloc((void**) &d_a, sizeof(float) * arraySize);
for (int i=0; i<arraySize; i++)
a[i]=0.8f;
cudaMemcpy(d_a, a, sizeof(float) * arraySize, cudaMemcpyHostToDevice);
cublasStatus_t ret;
cublasHandle_t handle;
ret = cublasCreate(&handle);
float* cb_result = (float*) malloc (sizeof(float));
ret = cublasSasum(handle, arraySize, d_a, sizeof(float), cb_result);
printf("\n\nCUBLAS: %.3f", *cb_result);
cublasDestroy(handle);
I have removed error checking for simplifying the code (there are no errors, CUBLAS functions return CUDA_STATUS_SUCCESS
) and free
and cudaFree
.
It compiles, it runs, it doesn't throw any error, but result printed is 0
, and, debugging, it is actually 1.QNAN
.
What did i miss?
One of the arguments to cublasSasum
is incorrect. The call should look like this:
ret = cublasSasum(handle, arraySize, d_a, 1, cb_result);
Note that the second last argument, incx
, should be in words, not bytes.