I am wondering if there is a difference between:
// cumalloc.c - Create a device on the device
HOST float * cudamath_vector(const float * h_vector, const int m)
{
float *d_vector = NULL;
cudaError_t cudaStatus;
cublasStatus_t cublasStatus;
cudaStatus = cudaMalloc(&d_vector, sizeof(float) * m );
if(cudaStatus == cudaErrorMemoryAllocation) {
printf("ERROR: cumalloc.cu, cudamath_vector() : cudaErrorMemoryAllocation");
return NULL;
}
/* THIS: */ cublasSetVector(m, sizeof(*d_vector), h_vector, 1, d_vector, 1);
/* OR THAT: */ cudaMemcpy(d_vector, h_vector, sizeof(float) * m, cudaMemcpyHostToDevice);
return d_vector;
}
cublasSetVector()
has two arguments incx
and incy
and the documentation says:
The storage spacing between consecutive elements is given by incx for the source vector x and for the destination vector y.
In the NVIDIA forum someone said:
iona_me: "incx and incy are strides measured in floats."
So does this mean that for incx = incy = 1
all elements of a float[]
will be sizeof(float)
-aligned and for incx = incy = 2
there would be a sizeof(float)
-padding between each element?
cublasHandle
- does cublasSetVector()
anything else what cudaMalloc()
doesn't do? cublas*()
function to other CUBLAS functions to manipulate them? There is a comment in a thread of the NVIDIA Forum provided by Massimiliano Fatica confirming my statement in the above comment (or, saying it better, my comment originated by a recall of having read the post I linked to). In particular
cublasSetVector
,cubblasGetVector
,cublasSetMatrix
,cublasGetMatrix
are thin wrappers aroundcudaMemcpy
andcudaMemcpy2D
. Therefore, no significant performance differences are expected between the two sets of copy functions.
Accordingly, you can safely pass any array created by cudaMalloc
as input to cublasSetVector
.
Concerning the strides, perhaps there is a misprint in the guide (as of CUDA 6.0), which says that
The storage spacing between consecutive elements is given by
incx
for the source vectorx
and for the destination vectory
.
but perhaps should be read as
The storage spacing between consecutive elements is given by
incx
for the source vectorx
andincy
for the destination vectory
.