This might be a stupid question, but I cannot seem to be able to find any resources specifically related to it, so your opinion is appreciated.
Let's say I have some functor
struct AddOne {
thrust::device_ptr<int> numbers;
__device__
void operator()(int i) {
numbers[i] = numbers[i] + 1;
}
}
that i can call from
AddOne addOneFunctor;
thrust::device_vector<int> idx(100), numbers(100);
addOneFunctor.numbers = numbers.data();
thrust::sequence(idx.begin(), idx.end(), 0);
thrust::for_each(thrust::device, idx.begin(), idx.end(), addOneFunctor);
Is it is possible to write the above so that the execution policy can be changed at either compile-time or ideally run-time?
E.g. change the struct to
struct AddOne {
thrust::pointer<int> numbers;
__host__ __device__
void operator()(int i) {
numbers[i] = numbers[i] + 1;
}
}
so it can be run something like
AddOne addOneFunctor;
std::vector<int> idx(100), numbers(100);
addOneFunctor.numbers = numbers.data();
thrust::sequence(idx.begin(), idx.end(), 0);
thrust::for_each(thrust::cpp::par, idx.begin(), idx.end(), addOneFunctor);
The bottom line is: I would like to have a single code-base where I can decide to either use thrust::device_vector
s or some sort of host vector (such as std::vector
s) and run these in the GPU (using thrust::device
exec policy) or CPU (using thrust::cpp::par
or similar policy) respectively.
PS: I would like to avoid PGI for now.
Yes, it's possible, pretty much exactly as you describe.
Here's a fully-worked example:
$ cat t1205.cu
#include <thrust/execution_policy.h>
#include <thrust/for_each.h>
#include <thrust/device_vector.h>
#include <thrust/sequence.h>
#include <iostream>
#include <vector>
struct AddOne {
int *numbers;
template <typename T>
__host__ __device__
void operator()(T &i) {
numbers[i] = numbers[i] + 1;
}
};
int main(){
AddOne addOneFunctor;
std::vector<int> idx(100), numbers(100);
addOneFunctor.numbers = thrust::raw_pointer_cast(numbers.data());
thrust::sequence(idx.begin(), idx.end(), 0);
thrust::for_each(thrust::cpp::par, idx.begin(), idx.end(), addOneFunctor);
for (int i = 0; i < 5; i++)
std::cout << numbers[i] << ",";
std::cout << std::endl;
thrust::device_vector<int> didx(100), dnumbers(100);
addOneFunctor.numbers = thrust::raw_pointer_cast(dnumbers.data());
thrust::sequence(didx.begin(), didx.end(), 0);
thrust::for_each(thrust::device, didx.begin(), didx.end(), addOneFunctor);
for (int i = 0; i < 5; i++)
std::cout << dnumbers[i] << ",";
std::cout << std::endl;
}
$ nvcc -o t1205 t1205.cu
$ ./t1205
1,1,1,1,1,
1,1,1,1,1,
$
Note that the algorithim is thrust::sequence
not thrust::seq
.
Using CUDA 8RC
As @m.s. points out, the explict use of the execution policies on the algorithms for the codes above are not necessary - you can remove those and it will work the same way. However the formal usage of execution policy allows the above example to be extended to the case where you are not using containers, but ordinary host and device data, so it may still have some value, depending on your overall goals.