when I'm running this piece of code, the compiler says I'm calling a host function from global function. I want to assing something a bit more complicated than zeros to A[i] and B[i] but I just wanted to test the functionality. I need to modify values in both vectors. Later I'd like to reduce the table A.
int main(void){
const int numElements = 100000;
thrust::device_vector<double> A(numElements);
thrust::device_vector<double> B(numElements);
int threadsPerBlock = 256;
int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;
vectorCount<<<blocksPerGrid, threadsPerBlock>>>(A, B, numElements);
}
__global__ void vectorCount(thrust::device_vector<double> A, thrust::device_vector<double> B, int numElements, int sequence_k){
int i = blockDim.x * blockIdx.x + threadIdx.x;
A[i] = 0;
B[i] = 0;
}
I've tried to change it to
struct saxpy_functor
{
const int numElements;
saxpy_functor(int _a) : numElements(_a) {}
__host__ __device__
double operator()(double& x) const {
x = 0;
return 0;
}
};
//in main
thrust::transform(A.begin(), A.end(), B.begin(), saxpy_functor(numElements));
But I can't find how to get i as in previous example, since i want to perform calculations dependant on the possition in the Vector?
In spite of its naming, a thrust::device_vector
is not directly usable in CUDA device code. The device_vector
is an object/container, and it is intended to be usable in host code only. This is why you get the messages about "calling a host function..."
For the example you have shown here, to access the data directly, you would (in host code) extract device pointers to the underlying data in each container (A
and B
) and pass those pointers to your CUDA kernel.
Something like this:
int main(void){
const int numElements = 100000;
thrust::device_vector<double> A(numElements);
thrust::device_vector<double> B(numElements);
double *d_A = thrust::raw_pointer_cast(A.data());
double *d_B = thrust::raw_pointer_cast(B.data());
int threadsPerBlock = 256;
int blocksPerGrid =(numElements + threadsPerBlock - 1) / threadsPerBlock;
vectorCount<<<blocksPerGrid, threadsPerBlock>>>(d_A, d_B, numElements);
}
__global__ void vectorCount(double *A, double *B, int numElements){
int i = blockDim.x * blockIdx.x + threadIdx.x;
if (i < numElements){
A[i] = 0;
B[i] = 0;}
}
Ordinary usage of the thrust::transform
/functor method won't allow random access to the underlying data. However you can use an "un-thrust-like" method like this to use the functor method and allow random access, if you wish.