I know that calls to a functor using thrust::for_each
with data in thrust::host_vector
's have a parallel execution policy, but do they actually execute in parallel?
If not, what would be the correct way to invoke these knowing that the system I'm running this on is virtualized so that all cores appear to be on the same machine?
[EDIT]
I realize that there is such a thing as thrust::omp::par
, however, I can't seem to be able to to find a full Thrust example using OpenMP.
In general, thrust operations dispatched on the "host" are not run in parallel. They use a single host thread.
If you want to run thrust operations in parallel on the CPU (using multiple CPU threads) then the recommended practice would be to use the thrust OpenMP backend.
A fully worked example is here.
Another worked example is here.